DRL / blobtools

Modular command-line solution for visualisation, quality control and taxonomic partitioning of genome datasets
GNU General Public License v3.0
184 stars 44 forks source link

the output plot is empty #94

Closed XClaws closed 4 years ago

XClaws commented 4 years ago

Dear Team,

I would like to check the contamination of my RNA read data. Here is how I apply blobtool for my goal:

  1. I have sample 10000 RNA reads from 8 different materials from the same organism
  2. I assembled a de novo Transcriptome based on these 80,000 RNA reads. (the assembly)
  3. then I mapped the RNA reads back to the transcriptome. (the mapping.bam)
  4. I blasted the Transcriptome to the ncbi nt database. (the blast.out)
  5. run the blobtool protocol

However, I got a weird output plot. Could you please help me to interpret it or tell me whether I applied it wrong or not?

Thank you very much, looking forward to your repsonse.

Lsal_RNA blobDB json bestsum phylum p8 span 100 blobplot bam0 Lsal_RNA blobDB json bestsum phylum p8 span 100 blobplot read_cov bam0

francicco commented 4 years ago

I got the same error, therefore I tested the example provided. There is the exact same result.

TEST1 blobDB json bestsum phylum p8 span 100 blobplot bam0 TEST1 blobDB json bestsum phylum p8 span 100 blobplot read_cov bam0

There's something wrong with the json file.

Best F

francicco commented 4 years ago

Ok, I got the problem. I was using python2.7 instead of python3

Blobtools out blobDB json bestsum phylum p8 span 100 blobplot bam0 Blobtools out blobDB json bestsum phylum p8 span 100 blobplot read_cov bam0

F

DRL commented 4 years ago

Hi XClaws,

you have to be aware that blobtools was originally developed for genome datasets and not for transcriptome datasets. The reason is that coverage in genome assemblies is a proxy for molarity of DNA molecules sequenced: DNA of a particular organism will be sequenced at roughly the same coverage, contaminants tend to occur at different coverages, etc ...

In a transcriptome dataset, coverage is a proxy for molarity of RNA molecules sequenced which is determined by expression levels.

I am not sure I entirely understand the description of your analysis, but it sounds like you subsampled reads from your RNAseq dataset. That makes no sense and will give you very few assembled contigs.

Regarding the example dataset not working, use python3 ...

cheers,

dom

XClaws commented 4 years ago

Dear Dom,

Thanks for your explanation!