DRL / blobtools

Modular command-line solution for visualisation, quality control and taxonomic partitioning of genome datasets
GNU General Public License v3.0
187 stars 44 forks source link

blobtools create with coverage file. #38

Closed ptranvan closed 7 years ago

ptranvan commented 7 years ago

Hi,

Based on that documentation:

https://blobtools.readme.io/docs/create

the option -ctake as input a TAB separated file (seqID\tcoverage).

I have ran bam2cov and I have at the end a file with 3 columns

contig_id read_cov base_cov

Should I parse this output and if yes how should it be at the end:

contig_id read_cov

or

contig_id base_cov

DRL commented 7 years ago

Hi ptranvan,

just give it the (3-column) output from bam2cov... blobtools knows about this format.

cheers,

dom

ptranvan commented 7 years ago

Thanks for your support.

1) blobtools knows this format meaning that I just can feed blobtools with the bam2cov output ?

2) Just one more thing, your command bam2cov and with some contigs give me a 0 coverage. For instance:


# contig_id     read_cov        base_cov

scaffold17787|size4717  3572    92.8210727157
scaffold203398|size450  0       0.0

Do you know where and how it can come from ?

I used a simple command for bwa:

bwa mem contig.fasta Pe1 Pe2 | samtools view -bS - > out.bam

DRL commented 7 years ago

1) blobtools knows this format meaning that I just can feed blobtools with the bam2cov output ?

yes

2) Just one more thing, your command bam2cov and with some contigs give me a 0 coverage.

That has to do with bwa-mem doing mapping. In your BAM file there are just no reads mapping to those contigs because they somehow mapped somewhere else or did not pass a parameter threshold. For the purpose of using blobtools to filter reads this has not much of an effect: they are low coverage contigs and are not part of the target organism.

All the best,

dom