Closed seb-mueller closed 6 years ago
Hello @seb-mueller, I'm currently working on the next release and the read length will be specified manually per sample. This will fix the issue.
Happy new year :)
Thanks for the quick response and happy new year too! I wasn't aware of the 0.24 branch (it's major changes). I'll switch over to it to give it a whirl then (so this can be considered solved).
When running dropSeqPipe on a rather small test dataset (~1000 reads) I got the following error:
Note, as suggested in issue #4, I've regenerated (
generate-meta
) the STAR index which helps only up to a certain read depth.After some research, it turned out the function
get_mean_read_length
in singleCell/star_align.snake returned the wrong value.The reason seems 2 fold, firstly a minimum number of reads is hard-coded (n = 1000000), so any dataset with less reads (as in my case) will probably run in this issue.
Secondly, even if the minimum nr of reads is meet, since the mean is calculated, if there are many trimmed reads it might lead to the wrong estimate too.
Not sure what the best solution is, maybe setting the minimum to the number of reads (or 1Mio, whetever is smaller) or mabye allowing to set this value manualy in the config file or excluding the --sjdbOverhang parameter altogether?
A possible workaround for me was to set
return(100)
instead of ofreturn(int(total_length/(n/4)))
Not sure if this issues is relevant enough to be solved, but thought to share my workaround anyway in case someone else is having a similar dataset.
Happy to send more details if needed.