Open yuankunzhu opened 5 years ago
@yuankunzhu — to clarify, you'd like to be able to modify this setting?
ultimately, this parameter should be set up according to the lib stranded status. So if the input data is stranded, such parameter should be 1 or 0; and if it's non-stranded, then 0.5 for example.
@yuankunzhu — I see, thank you for the explanation. I'll look into how easy / fast it is to ascertain stranded status and see if I can add it to the workflow. If you have a fast tool you can recommend that'd be appreciated.
This tool has a strand checker: https://hartleys.github.io/QoRTs/ but only works on BAM input files.
Thanks for looking into this @jvivian. I know Salmon could do such check up too: https://salmon.readthedocs.io/en/latest/salmon.html#what-s-this-libtype
As of version 0.7.0, Salmon also has the ability to automatically infer (i.e. guess) the library type based on how the first few thousand reads map to the transcriptome. To allow Salmon to automatically infer the library type, simply provide -l A or --libType A to Salmon.
@yuankunzhu, I looked at the Salmon note too, but it can only detect what the aligner was told the data was, not whether the sequence data itself came from a stranded or unstranded library. I'm pretty sure this will have to be a parameter based on a human's knowledge of the library prep.
"Thus, for example, if the upstream aligner has been told to perform strand-aware mapping (i.e. to ignore potential alignments that don’t map in the expected manner), but the actual library is unstranded, automatic library type detection cannot detect this. It will attempt to detect the library type that is most consistent with the alignment that are provided."
--forward-prob
was hard set to 0.5, while the documentation of that argument describes as:Should make this as a variable associated with the stranded status
actual code line: https://github.com/BD2KGenomics/toil-rnaseq/blob/master/src/toil_rnaseq/tools/quantifiers.py#L82