DRL / blobtools

Modular command-line solution for visualisation, quality control and taxonomic partitioning of genome datasets
GNU General Public License v3.0
187 stars 44 forks source link

filtering reads tutorial issue #47

Closed guifftc closed 7 years ago

guifftc commented 7 years ago

Dear developers, sorry if this is more a tutorial/conceptual issue, than a program/code one, thanks a lot for your work.

I have two RNAseq libraries that contain 2 eukaryotes (target and food) & several bacteria (polyA selection didn't work) for which I assembled together with trinity and now I want blobtools to filter the reads for my target eukaryote in two rounds, first remove bacteria, then the food eukaryote.

First question, would you work on each library independently or together? I don't know if there's any conceptual difference.

Then, I'd like to know if you could provide some commands to get a list of contigs ranked as superkingdom instead of phylum as is printed in blobDB.table.txt. Also, why the "kingdom" rank is not available? That would be really useful.

Finally, the read filtering strategy that you provide is based on a list of contigs of interest. I managed to get a taxonomy-based list, so I wonder if there is a strategy to get bins of contigs considering also %GC and coverage. Maybe it's exactly what is missing in the "under construction" sections for filtering assemblies, then I would like to know if you have a release date in mind for that.

Thank you very much, keep on the good work!

Maya0801 commented 7 years ago

Hi,

I'm also having the same issue, are there any news on the topic?

DRL commented 7 years ago

Hi,

I added some additional information regarding read filtering.

First question, would you work on each library independently or together? I don't know if there's any conceptual difference.

I would assemble them together.

Then, I'd like to know if you could provide some commands to get a list of contigs ranked as superkingdom instead of phylum as is printed in blobDB.table.txt. Also, why the "kingdom" rank is not available? That would be really useful.

You can control the output of blobtools view by specifying -r TAXRANK. Or you can retrieve all taxonomic ranks by saying -r all. I am sorry but 'kingdom' is currently not supported.