still sjdbOverhang Error for very small datasets

seb-mueller commented 6 years ago

When running dropSeqPipe on a rather small test dataset (~1000 reads) I got the following error:

EXITING because of fatal PARAMETERS error: sjdbOverhang <=0 while junctions are inserted on the fly with --sjdbFileChrStartEnd or/and --sjdbGTFfile
SOLUTION: specify sjdbOverhang>0, ideally readmateLength-1

Note, as suggested in issue #4, I've regenerated (generate-meta) the STAR index which helps only up to a certain read depth.

After some research, it turned out the function get_mean_read_length in singleCell/star_align.snake returned the wrong value.

The reason seems 2 fold, firstly a minimum number of reads is hard-coded (n = 1000000), so any dataset with less reads (as in my case) will probably run in this issue.

Secondly, even if the minimum nr of reads is meet, since the mean is calculated, if there are many trimmed reads it might lead to the wrong estimate too.

Not sure what the best solution is, maybe setting the minimum to the number of reads (or 1Mio, whetever is smaller) or mabye allowing to set this value manualy in the config file or excluding the --sjdbOverhang parameter altogether?

A possible workaround for me was to set return(100) instead of of return(int(total_length/(n/4)))

Not sure if this issues is relevant enough to be solved, but thought to share my workaround anyway in case someone else is having a similar dataset.

Happy to send more details if needed.

Hoohm commented 6 years ago

Hello @seb-mueller, I'm currently working on the next release and the read length will be specified manually per sample. This will fix the issue.

Happy new year :)

seb-mueller commented 6 years ago

Thanks for the quick response and happy new year too! I wasn't aware of the 0.24 branch (it's major changes). I'll switch over to it to give it a whirl then (so this can be considered solved).

Hoohm / dropSeqPipe

still sjdbOverhang Error for very small datasets #14