SionBayliss / PIRATE

A toolbox for pangenome analysis and threshold evaluation.
GNU General Public License v3.0
88 stars 29 forks source link

Annotations did not match contig names #42

Closed Noahw15 closed 4 years ago

Noahw15 commented 4 years ago

Hello,

I apologize if this is the wrong place to raise this issue but I'm hoping you'll be able to provide some insight. I have several fasta files that I annotated using Prokka. I then took the gff3 files that were created and tried to run PIRATE. I received the error that too few (0) of my files had passed QC. I opened the Parse_GFF_log.txt file and saw the message "Annotations did not match contig names". Currently the contig sequences at the end of the gff file are named numerically ie 0,1,2 etc. What annotations does it want the contig names to match? How do I go about resolving this issue? Thank you in advance.

SionBayliss commented 4 years ago

Hi Noah,

I am surprised. If you have annotated using prokka on default(ish) settings there shouldn't be any issues. That warning appears if the first field in the GFF3 annotation lines does not match the names of the contigs at the bottom of the file. Which version are you using? Could you send me a few example files for me to test? I have just pushed a new version and it maybe a bug I have introduced.

All the best, Sion

Noahw15 commented 4 years ago

Hi Sion, I just sent you an email containing the files that caused the error. They were annotated using prokka version 1.14.6 and I attempted to run PIRATE version 1.0.3. Best, Noah

SionBayliss commented 4 years ago

Hi Noah,

Having looked at the files there is an obvious discrepancy between annotation and fasta headers for contig '0'. In the prokka GFFs contig 0 has been renamed to SEQ in the annotation line (col 1). This is most likely due to the your contig naming scheme having 0 as a fasta header which maybe interpreted as binary 0 (empty) by some programs in prokka.

In short, this isn't a PIRATE issue but one caused by the input being non-standard. If you rename the contig headers starting from 1 and reannotate them you shouldn't have an issue.

All the best, Sion