USCbiostats / partition.archive

https://uscbiostats.github.io/partition
0 stars 1 forks source link

Improve overall speed of partition #4

Open malcolmbarrett opened 6 years ago

malcolmbarrett commented 6 years ago

To start with this goal, I've created a repo called partition.benchmark to benchmark partition's baseline speed and keep track of it as we improve the package:

https://github.com/malcolmbarrett/partition.benchmark

Updated 7/6

malcolmbarrett commented 6 years ago

Benchmarks update: rcpp branch

partition()

ICC is about 2x as fast at higher n, even faster at lower. MI is a bit slower because it's being called from C++ now. At higher n, it may be worth thinking about a better way to handle that. MinR2 is faster but was already doing well. First PC is a tad slower, so I'll have to look into that, but it was always fast.

kmeans_icc()

...well, let's just say I was wrong about it maybe not being worth it to pursue the MLpack implementation. It's lightning fast! If I'm not mistaken, there's also a PCA function in that library, so that could be a good option for that problem. I'll try to get it to play nicely with Travis and may contact Dirk Eddelbuettel re: a timeline for the new implementation, RcppMLPack2.

Potential areas of improvement

cc: @millstei @pmarjora @gvegayon

millstei commented 6 years ago

Nice! I'm getting encouraging results in the real data analysis, so I think that will help drive the first publication. Once I get the next version of the simulation results, I will start writing, so I think the timing is really good.

Josh


From: Malcolm Barrett notifications@github.com Sent: Friday, July 6, 2018 10:21:39 AM To: USCbiostats/partition Cc: Joshua Millstein; Mention Subject: Re: [USCbiostats/partition] Improve overall speed of partition (#4)

Benchmarks update: rcpp branch partition()

ICC is about 2x as fast at higher n, even faster at lower. MI is a bit slower because it's being called from C++ now. At higher n, it may be worth thinking about a better way to handle that. MinR2 is faster but was already doing well. First PC is a tad slower, so I'll have to look into that, but it was always fast.

kmeans_icc()

...well, let's just say I was wrong about it maybe not being worth it to pursue the MLpack implementation. It's lightning fast! If I'm not mistaken, there's also a PCA function in that library, so that could be a good option for that problem. I'll try to get it to play nicely with Travis and may contact Dirk Eddelbuettel re: a timeline for the new implementation, RcppMLPack2.

Potential areas of improvement

cc: @millsteihttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_millstei&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=5BORHoQYE9w2BHZNTcSvXLhGafxYZxC6tuYqXha0jeA&m=xDFb7kOqvRR1NXdNprUsZgjlK4lkPr6Rq7CawlTUr3M&s=6zlU_0O70N3a9pRiqlmoXgmylB_ZEqIv4KXltY7wS1g&e= @pmarjorahttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_pmarjora&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=5BORHoQYE9w2BHZNTcSvXLhGafxYZxC6tuYqXha0jeA&m=xDFb7kOqvRR1NXdNprUsZgjlK4lkPr6Rq7CawlTUr3M&s=mYnIVId3XJXm-U1q1ixe7CFvp_1TQSU0Pla41mgXF7o&e= @gvegayonhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_gvegayon&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=5BORHoQYE9w2BHZNTcSvXLhGafxYZxC6tuYqXha0jeA&m=xDFb7kOqvRR1NXdNprUsZgjlK4lkPr6Rq7CawlTUr3M&s=MUQVR1PnZBB4PrpDjqOYM6ntSw83dvYbBYfz0eRPzPw&e=

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_USCbiostats_partition_issues_4-23issuecomment-2D403096704&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=5BORHoQYE9w2BHZNTcSvXLhGafxYZxC6tuYqXha0jeA&m=xDFb7kOqvRR1NXdNprUsZgjlK4lkPr6Rq7CawlTUr3M&s=iJT2BBgfdF2-lQmuKBv6OzpNQwXuFcNWPQkXSIDBk28&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AWCY8JJAunS7du7KR1S7JXV3Bbu-2DfUiPks5uD5yjgaJpZM4ULaFZ&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=5BORHoQYE9w2BHZNTcSvXLhGafxYZxC6tuYqXha0jeA&m=xDFb7kOqvRR1NXdNprUsZgjlK4lkPr6Rq7CawlTUr3M&s=0WpzbUgLRvNCie83OuLuKoLV8v6kEg1-OeaDPADXT2g&e=.