Rare Infinite Loop While Extracting Unmapped?

COMBINE-lab / SalmonTools

Useful tools for working with Salmon output

BSD 3-Clause "New" or "Revised" License

35 stars 21 forks source link

Rare Infinite Loop While Extracting Unmapped? #2

Open Miserlou opened 6 years ago

Miserlou commented 6 years ago

Perhaps you can shed some light on top this - very occasionally, we see salmontools processes which seem to never terminate.

Here you can see some which have been operating for more than 4 hours and which are still consuming full CPU:

Here is the sample in question: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM2432103

Do you have any idea what might be causing this?

Sorry that this isn't a more reproducible report!

kurtwheeler commented 6 years ago

Here are the 10 accession codes which had the longest jobs which successfully completed and the length of the transcriptome index we used to run it:

accession_code |     index_type      
----------------+---------------------
 SRR4423743     | TRANSCRIPTOME_SHORT
 SRR5342767     | TRANSCRIPTOME_SHORT
 SRR3666783     | TRANSCRIPTOME_SHORT
 SRR6494603     | TRANSCRIPTOME_SHORT
 SRR1524241     | TRANSCRIPTOME_LONG
 SRR4423749     | TRANSCRIPTOME_SHORT
 SRR6297667     | TRANSCRIPTOME_LONG
 SRR6877472     | TRANSCRIPTOME_LONG
 SRR4423750     | TRANSCRIPTOME_SHORT
 SRR6494612     | TRANSCRIPTOME_SHORT

These transcriptome indices can be downloaded here: https://s3.amazonaws.com/data-refinery-s3-transcriptome-index-circleci-prod/DANIO_RERIO_TRANSCRIPTOME_LONG.tar.gz

https://s3.amazonaws.com/data-refinery-s3-transcriptome-index-circleci-prod/DANIO_RERIO_TRANSCRIPTOME_SHORT.tar.gz

Miserlou commented 6 years ago

These samples are also derived from .sra files, extracted with fasterq-dump.

Could our issue have anything to do with the bug mentioned in this unmerged pull request?