DRL / blobtools

Modular command-line solution for visualisation, quality control and taxonomic partitioning of genome datasets
GNU General Public License v3.0
184 stars 44 forks source link

Tip: some jq code to get list of "good" contigs #121

Open kdm9 opened 2 years ago

kdm9 commented 2 years ago

Hello,

This is mostly a PSA, as the following took me way to long to work out myself. Perhaps the authors could add this to the docs somewhere appropriate.

To filter a set of contigs based on the GC content and coverage (a la the blobplot), one can use the following jq command:

jq -r '.dict_of_blobs[] | select((.covs.bam0 > 10) and (.gc > 0.4)) | .name' \
    < path/to/something.blobDB.json \
    > goodcontigs.txt

Here, I use a coverage threshold of 10 in the first bam, and a minmum GC of 0.4. Obviously adjust these thresholds to your blobplot. Additional bams would be supported by adding something like (.covs.bam1 > 23) and within the select() function. The resulting goodcontigs.txt is a simple text list of contig names compatible with blobtools seqfilter.

Thanks for a great tool, K