biomedicalinformaticsgroup / Sargasso

Sargasso disambiguates mixed-species high-throughput sequencing data.
http://biomedicalinformaticsgroup.github.io/Sargasso/
Other
8 stars 4 forks source link

Facilitating assessment of ambiguous reads. #110

Open sknaack opened 10 months ago

sknaack commented 10 months ago

I like Sargasso very much! It's working very well for me and is a nice integrated pipeline. I've one hitch/consternation to inquire about, or at east to suggest for further development/updates.

I'm interested in the ambiguous reads that meet filtering criteria for multiple species, i.e., to check where these reads preferentially come from. Specifically in an RNA-seq analysis I want to know if there's an association with any specific gene sets (highly conserved, or for specific biological processes) as a sanity check. It's not intuitive to recover that read set as output .bam files include only the filtered reads, and the exact ordering of applying all criteria isn't immediate to replicate from the sorted .bam files, either. Is there an efficient path to extract that list of ambiguously mapped reads? Or even a set of separate output .bam files?

Other than that I only lightly suggest integrating a way to manage the STAR alignment parameters, or even pass pre-generated/sorted alignment .bam files, to the separator executable, but of course this can be a bit "hacked" with the Makefile in any case. Again, a very nice tool, and thank you! Sara Knaack

lweasel commented 10 months ago

Hi Sara - many thanks for your kind words about the tool, and glad to hear that it is helping you. We will have a think about how best to implement these features and get back to you with an update soon!

sknaack commented 10 months ago

Thank you so much, it would be very nice to see such updates implemented!