BCCDC-PHL / dragonflye-nf

Nextflow wrapper for the dragonflye assembler, with additional QC
MIT License
4 stars 1 forks source link

Simplify contig ids #3

Closed dfornika closed 1 year ago

dfornika commented 1 year ago

The contig ids produced by dragonflye are quite long & detailed. It would be preferable to separate the contig ID from the detailed info so tools like abricate can include a short/simple contig ID in the output.

dfornika commented 1 year ago

This may not require fixing in this pipeline. A typical fasta header from a dragonflye assembly looks like this:

>SAMPLE_contig00001 len=5468462 cov=46.0 origname=contig_1_polypolish polish=racon:1 round(s);polypolish:short_reads,1 round(s); sw=dragonflye-flye/1.1.0 date=20230621 circular=Y

A typical fasta header from a unicycler assembly looks like this:

>SAMPLE_1 length=728413 depth=1.00x

This issue was prompted by the fact that the full dragonflye header appears in the resistance_gene_contig_id in the output of our BCCDC-PHL/plasmid-screen pipeline. But it appears that may be caused by some downstream parsing logic in that pipeline or maybe in mob-suite.