gem-pasteur / PanACoTA

PANgenome with Annotations, COre identification, Tree and corresponding Alignments
GNU Affero General Public License v3.0
72 stars 8 forks source link

annotate error: Uh oh! Sequence file 'XXXXX_genomic.fna_prepare-split5N.fna' contains duplicate sequence ID #3

Closed IsabelFE closed 4 years ago

IsabelFE commented 4 years ago

I am getting this error when using annotate: Uh oh! Sequence file 'HELP/prepare/tmp_files/GCF_002351405.1_ASM235140v1_genomic.fna_prepare-split5N.fna' contains duplicate sequence ID: NZ_NWBP01000037.1

I ran prokka by itself both in the Database_init output from prepare and in the tmp_files output from prepare. I think that there is something on the prepare-split5N.fna files that gives and error in prokka, while the original fasta files on the folder Database_init run prokka fine.

Thanks again!

IsabelFE commented 4 years ago

@asetGem, I invited you to my repo so you can see on the folder HELP all the code I ran, I hope that helps understand the issue.

asetGem commented 4 years ago

Hi!

Thanks for posting this issue! It is something another user already told me, and I'm currently working on fixing that.

For information, here is what happens:

Due to the fact that contigs are split every 5N, when it splits a contig, PanACoTA creates a new header, with a unique ID. In your case, contig >NZ_NWBP01000037.1 Corynebacterium accolens strain AH4003 NODE_15_length_33874_cov_728.915, whole genome shotgun was split into 2 contigs : >NZ_NWBP01000037.1 Corynebacterium accolens strain AH4003 NODE_15_length_33874_cov_728.915, whole genome shotgun sequence_1 and >NZ_NWBP01000037.1 Corynebacterium accolens strain AH4003 NODE_15_length_33874_cov_728.915, whole genome shotgun sequence_2

However, the current version of prokka only takes into account the first characters before the first space in the header name, that is, here, >NZ_NWBP01000037.1...which is identical for both contigs! -> whence your error message "[...]contains duplicate sequence[...]".

I'll fix that in the beginning of next week.

IsabelFE commented 4 years ago

I see, that makes sense. Looking forward an updated version.

Thanks

asetGem commented 4 years ago

Hi, This is now solved in version 1.0.1. You can update your version from the github repository (git pull, ./make upgrade). Let me know if you still have this problem!

IsabelFE commented 4 years ago

Thanks! I've run again the Corynebacterium accolens example and annotated with our issues.

I will start another issue with a more practical, rather than technical question. Thanks again!