Freezes\out of memory on unplaced small RNAs stage

Chupalav commented 5 years ago

Hi Mike and thank you for such useful software. Im using Shortstack on debian 9 stretch x64 Everything runs OK untill search of unplaced small RNAs - it never ends( aborted manually after 48h of runtime) Tried on pc`s w\ 8-16gb memory and swap with sort mem from default 768M to 15GB with the same result. Do you have any suggestions about possible problem and solution? Thanks.

MikeAxtell commented 5 years ago

Hello, thanks for the message. Can you send me the Log.txt file associated with your run, and also let me know the exact samtools and bowtie versions used?

Best, Mike

On Wed, Sep 26, 2018 at 4:36 AM Chupalav notifications@github.com wrote:

Hi Mike and thank you for such useful software. Im using Shortstack on debian 9 stretch x64 Everything runs OK untill search of unplaced small RNAs - it never ends( aborted manually after 48h of runtime) Tried on pc`s w\ 8-16gb memory and swap with sort mem from default 768M to 15GB with the same result. Do you have any suggestions about possible problem and solution? Thanks.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MikeAxtell/ShortStack/issues/81, or mute the thread https://github.com/notifications/unsubscribe-auth/AGiXic2CO9WNY4mi6R2g6HvNIqzMMuWTks5uezykgaJpZM4W6KD7 .

-- Michael J. Axtell, Ph.D. Professor of Biology Penn State University http://sites.psu.edu/axtell

Chupalav commented 5 years ago

Log.txt Here you are. Thank you for your swift reply samtools 1.3.1 w/ htslib 1.3.2 (also tried 1.7x; 1.9x;) bowtie 1.1.2 64 bit viennarna 2.4.9.1

MikeAxtell commented 5 years ago

Thanks.

I see a few things you can consider.

Looks like your genome file is actually miRBase hairpins. ShortStack was designed to work with an actual complete reference nuclear genome assembly. If miRBase is used, most reads will not align (because most sRNA-seq data are not mature microRNAs). I can see from the log that about 95% of your reads didn't align. Over 296 million reads. I would suggest using the appropriate reference genome instead of miRBase.
The stalling at the last phase ("Performing search for unplaced small RNAs.") is not unexpected given that there is such an unusually large number of unplaced reads. This phase is a brute-force sorting and counting of all unplaced reads. Usually, with a true reference genome, there aren't too many of these, and this completes in few minutes with modest memory. But with nearly 300 million unplaced reads, I'm not surprised that this stalled on you.

Unfortunately right now there's not a switch to turn off the unplaced read search. So to get results with your run you'd need to let it run out to completion.

I would recommend instead not using miRBase as a reference genome and instead use the reference genome that corresponds to your organism.

Hope this helps and sorry the program can't handle your use case well at the present time.

On Wed, Sep 26, 2018 at 10:04 AM Chupalav notifications@github.com wrote:

Log.txt https://github.com/MikeAxtell/ShortStack/files/2420027/Log.txt Here you are. Thank you for your swift reply samtools 1.3.1 w/ htslib 1.3.2 (also tried 1.7x; 1.9x;) bowtie 1.1.2 64 bit viennarna 2.4.9.1

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MikeAxtell/ShortStack/issues/81#issuecomment-424727368, or mute the thread https://github.com/notifications/unsubscribe-auth/AGiXiSq5PzziLi9gxocoHnFVD6foGcOCks5ue4lLgaJpZM4W6KD7 .

-- Michael J. Axtell, Ph.D. Professor of Biology Penn State University http://sites.psu.edu/axtell

Chupalav commented 5 years ago

Well, that is interesting, cuz the same input and the same genome file were successfully processed on 2 other machines (ubuntu xenial) and rna search didn`t take much time Log1.txt Log.txt Could it be distro-specific issue? (tried stable testing and ustable debian)

MikeAxtell commented 5 years ago

Yes, that's curious. My guess would be that the difference is because of available memory. The unplaced sRNA search (as best I recall, I wrote that code a while ago) depends on a brute force sort and hash. So it could consume a lot of memory in your case where there are so many unplaced reads. If it burned through available memory and into swap on one machine, that would explain it.

Best, Mike

On Wed, Sep 26, 2018 at 11:16 AM Chupalav notifications@github.com wrote:

Well, that is interesting, cuz the same input and the same genome file were successfully processed on 2 other machines (ubuntu xenial) and rna search didn`t take much time Log1.txt https://github.com/MikeAxtell/ShortStack/files/2420353/Log1.txt Log.txt https://github.com/MikeAxtell/ShortStack/files/2420361/Log.txt Could it be distro-specific issue? (tried stable testing and ustable debian)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MikeAxtell/ShortStack/issues/81#issuecomment-424753393, or mute the thread https://github.com/notifications/unsubscribe-auth/AGiXieUHqSLkJ61FXRgXG7OQuIYcSK7kks5ue5pggaJpZM4W6KD7 .

-- Michael J. Axtell, Ph.D. Professor of Biology Penn State University http://sites.psu.edu/axtell

MikeAxtell / ShortStack

Freezes\out of memory on unplaced small RNAs stage #81