bactopia / bactopia

A flexible pipeline for complete analysis of bacterial genomes
https://bactopia.github.io
MIT License
412 stars 69 forks source link

[question] Continuing with GenBank .gbk and .gff files #395

Closed Svnipni closed 1 year ago

Svnipni commented 1 year ago

Bactopia seemed to assemble and complete fine on my pair-end samples, but I'm having a little difficulty navigating the output files. Until recently I've mainly wored worked with metabarcoding and metagenomes. I'm relatively new to dealing with full bacterial genome sequencing. The .gbk and .gff files are quite heavy for my samples and I can't seem to properly navigate them in Galaxy or Proksee. Loading the gbk or gff files in proksee is giving me an error message that the files are too large, reading over 14Mbp whilst the assembled genome itself is barely 6.5Mbps. It seems I will need to curate the files a bit before I can properly use them, or did something go wrong running the pipeline?

rpetit3 commented 1 year ago

HI @Svnipni

The increase in size seems about right. Does Proksee need the sequence, or just the annotations?

If its a strict file size limit, I wonder if maybe there's a way to abbreviate some annotations (e.g. hypothetical protein -> hypothetical or hypo)

Happy to help where I can, Robert

Svnipni commented 1 year ago

Thank you! I'm trying to load the assembled genome from one of my samples into Proksee as single sequence and then take it from there on the annotations. Also, it seems I'm missing the contigs.fna file in /assembly. I'd love to be able to load the genome and have the contigs as annotations or loaded separately as alignment to the genome.

On an additional note, abricate did great and found some interesting results on my samples, but I struggle to find a way to have its results loaded properly as annotations on the whole genome as it's using the contigs to find the ARGs. I suppose there's not a workaround to have abricate list the sequences rather than loci on the contigs?

Svnipni commented 1 year ago

I may dealing with a contaminated sample. The assembled genome was 14Mbp, far larger than its references. I tried running mobsuite via Bactopia Tools to see if it detects plasmid features, but it returned a 127 error for me.