DRL / blobtools

Modular command-line solution for visualisation, quality control and taxonomic partitioning of genome datasets
GNU General Public License v3.0
187 stars 44 forks source link

which file represents the BlobPlot? #77

Closed Jigyasa3 closed 5 years ago

Jigyasa3 commented 6 years ago

When I run the command - blobtools blobplot -i sample-DB.blob.blobDB .json -p 10 --format png -r phylum I get three file types- blobplot.bam() , blobplot.cov(), blob.cov.bam,blobplot.covsum, read_cov.covsum(), blobplot.spades. In the tutorial, only blobplot.bam() is used. What is the use of the other files?

When I run the command- blobtools blobplot -i sample-DB.blob.blobDB .json -c sample-coveragefile.cov -p 10 --format png -r phylum I get covplot.bam(), covplot.cov(),covplot.covsum, covplot.spades, covplot.stats files. Can you please explain these file types? There is no documentation on the same.

DRL commented 6 years ago

Hi Jigyasa3,

blobtools blobplot -i sample-DB.blob.blobDB .json -p 10 --format png -r phylum

This generates a blobtplot based on the sample-DB.blob.blobDB .json blobDB for 10 (-p 10) taxonomic groups at the taxonomic level of phylum (-r phylum). Since you must have provided both coverage information (in form of a COV file, or based on FASTA headers from your spades assembly) and a BAM file, when you were making the blobDB, blobtools outputs one blobplot and one *stats.txt file. In addition it produced a covsum blobtplot and stats.txt which is the sum of all the coverages for a contig across both libraries. The stats.txt files list basic metrics for each taxonomic group in the analysis. It also includes the HEX colours used when plotting, which is related to your question in issue #75.

blobtools blobplot -i sample-DB.blob.blobDB .json -c sample-coveragefile.cov -p 10 --format png -r phylum

With this command you are technically not creating a blobplot, but a covplot. Here the coverage in file sample-coveragefile.cov is used to position contigs on the y-axis based on the coverage in this file and on the x-axis based on the coverage in the other coverage libraries (the ones you used when creating the BlobDB). This is useful for exploring patterns of differential coverage of contig across different libraries.

Let me know if this helps.

cheers,

dom