amplab / snap

Scalable Nucleotide Alignment Program -- a fast and accurate read aligner for high-throughput sequencing data
https://www.microsoft.com/en-us/research/project/snap/
Apache License 2.0
287 stars 66 forks source link

[v1.0.0beta.13] deadlock #35

Closed blahah closed 9 years ago

blahah commented 9 years ago

I'm getting deadlock with v1.0.0beta.13. I'm aligning a pair of 5.0GB files to a fairly small transcriptome (73516 contigs, FASTA filesize 41MB) with 20 threads. SNAP writes out 4.0GB of BAM file and then the SNAP process' CPU usage drops to 0% in the S state.

My best guess is that it's similar to the deadlock bug that was fixed in beta13.

Please let me know if you'd like any more details, the files, etc.

rnpandya commented 9 years ago

Thanks for the report. If you send the files and the command line, we'll take a look at it. Are you using sorting? If so, does it still happen with unsorted output?

Thanks,

Ravi

blahah commented 9 years ago

This is without sorting.

I just confirmed that it is related to the changes in beta 13 (63d75ceec03b66464534846a9eac0fdf5a98658a) by upping the limits that were changed in that commit. See these two commits for my changes that let my run complete successfully (although with massive memory usage): https://github.com/HibberdLab/snap/commit/f971ff6af481dc14f9da3c12873547bd61021af6 and https://github.com/HibberdLab/snap/commit/e46b33e45ab7e85be0671d459a19e5435cdc6cc0

Any suggestions for the best way to transfer the files? If you don't have anywhere to dump them I'll set up a temporary server.

rnpandya commented 9 years ago

OK, we'll take a closer look at that. You can upload the data to OneDrive, it gives you 15GB of storage for free IIRC. Thanks,

Ravi

rnpandya commented 9 years ago

I think I've identified the root cause - try the ravip-deadlock branch and let me know if that fixes it for you. Thanks,

Ravi

rnpandya commented 9 years ago

I've pushed the fix as 1.0beta.14. Let me know if you still see the problem.

blahah commented 9 years ago

Just to confirm, this is fixed in beta 14. Thanks :)