WrightonLabCSU / DRAM

Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
GNU General Public License v3.0
249 stars 52 forks source link

ValueError: duplicate headers in DRAM-v.py but not when running DRAM.py #294

Closed STraving closed 1 year ago

STraving commented 1 year ago

Dear DRAM team.

First, thanks for a great tool! I work with viral contigs and have been using DRAM-v.py before which ran without issue when taken directly from Virsorter2. What I am currently trying to do is bin my viral contigs before annotating them. DRAM-v wouldn't (or I simply have overlooked how) take multiple fasta files as input, I therefore went and concatenated them all, ensuring every contig have a unique header. However here's the issue: when DRAM-v.py checks through, it says there's duplicate headers.

example: ValueError: The FASTA file all_fasta_combined.fasta contains duplicate headers, you must correct this before continuing. The duplicate headers are: ['vRhyme_1Complete_Site_1_Depth_0_000000120442||full', 'vRhyme_1Complete_Site_1_Depth_0_000000121187||full', 'vRhyme_1Complete_Site_1_Depth_0_000000268715||full', 'vRhyme_1Complete_Site_1_Depth_0_000000041597||full', 'vRhyme_1Complete_Site_1_Depth_0_000000095461||full', 'vRhyme_1Complete_Site_1_Depth_0_000000003599||full', 'vRhyme_1Complete_Site_1_Depth_0_000000008378||full', 'vRhyme_1Complete_Site_1_Depth_0_000000169011||full', 'vRhyme_1Complete_Site_1_Depth_0_000000024204||full', 'vRhyme_1Complete_Site_1_Depth_0_000000030648||full', 'vRhyme_1__Complete_Site_1_Depth_0_000000060357||full']

So now my question is, whether its simply because I have a long header (I know it could be cleaner but I wanted to tidy up at the very end) and DRAM-v stops comparing after xx number of characters? My headers aren't unique if the 12 digit number in the end is excluded. Or is there a specific symbol/character from which DRAM-v stops comparing? I have no issues running DRAM.py on my bins directly (if done for each bin separately but I have too many for that), and also on the concatenated file that threw the error above.

Thank you for your time and I'd appreciate any feedback.

Cheers,

Sachia

STraving commented 1 year ago

I ended up concatenating them into multiple smaller batches and then just ran DRAM-v on them in a loop. I think it should merge fine after distilling. We'll see :) Thanks again for a great tool. Sachia