amplab / snap

Scalable Nucleotide Alignment Program -- a fast and accurate read aligner for high-throughput sequencing data
https://www.microsoft.com/en-us/research/project/snap/
Apache License 2.0
287 stars 66 forks source link

snap speedup above 32 threads #116

Closed ramcn closed 3 years ago

ramcn commented 6 years ago

Hello,

I am doing performance study of SNAP on skylake processor (stampede2 system) which has 48*2=96 hardware threads per node. The initial experiments show that beyond 32 threads there isn't much speedup.

Is this a known limitation? If there is any particular code optimization which I can try out to gain speedup beyond 32 threads, kindly provide some insights.

Thank you, Ram

bolosky commented 6 years ago

The scaling limit on SNAP typically is the system memory bandwidth. Every lookup in the hash table is at least one cache miss all the way to memory, and sometimes a few more. Likewise, every reference into the reference genome to check a candidate alignment is also a cache miss to memory (unless there are several candidates nearby).

There’s really not all that much you can do about it. One possibility that might help a little is to try the -hp option, which will use huge pages for the index. This might make index load slow (on Windows, it’s so awful that you won’t want to use it, on Linux somewhat less but still painful). This will result in fewer translation lookaside buffer misses (if you don’t know what this is, don’t worry, it’s something internal to the processor that takes both time and memory bandwidth). That might help some, and might not, depending on what’s really the problem.

There’s also some possibility that you’re IO limited and you can’t read/write the data fast enough. You can tell the difference between the two by looking at the processor load. If it’s memory bandwidth, the cores will be running at (or close to) 100%, while if it’s IO then you’ll see much lower CPU load.

--Bill

From: ramcn notifications@github.com Sent: Wednesday, June 13, 2018 6:48 PM To: amplab/snap snap@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [amplab/snap] snap speedup above 32 threads (#116)

Hello,

I am doing performance study of SNAP on skylake processor (stampede2 system) which has 48*2=96 hardware threads per node. The initial experiments show that beyond 32 threads there isn't much speedup.

Is this a known limitation? If there is any particular code optimization which I can try out to gain speedup beyond 32 threads, kindly provide some insights.

Thank you, Ram

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Famplab%2Fsnap%2Fissues%2F116&data=02%7C01%7Cbolosky%40microsoft.com%7Cee262cbaef974bfa0d7808d5d198cad4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636645376522128786&sdata=OIhqKdW9rfIhz2ojSCml1LMRci1i3s3zkCFwnqyp%2Fcw%3D&reserved=0, or mute the threadhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAA752axw_8xbpPhGzWguGOGCFT5r-qVjks5t8cCygaJpZM4UnL8l&data=02%7C01%7Cbolosky%40microsoft.com%7Cee262cbaef974bfa0d7808d5d198cad4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636645376522128786&sdata=5lJ8Vh76q4VRwc4M1o3001Ud5%2BVXeI%2FVcLNPNigtUoc%3D&reserved=0.

ramcn commented 6 years ago

Thanks for the suggestions. I will profile the application and get more information about memory IO and CPU utilization and try to figure out what is going on.