broadinstitute / SynerClust

source code for SynerClust
Other
9 stars 4 forks source link

FormatAnnotation_external.py GFF file format specificity #13

Open dspeth opened 4 years ago

dspeth commented 4 years ago

The FormatAnnotation_external.py helper script results in errors if the GFF3 file format/content deviates from the test dataset.

Specifcally, the script assumes "CDS" lines are preceded by "gene" lines. This is not always the case in prokaryote annotation, when done with Prodigal 2.6.3 (no "gene" lines by default) or Prokka 1.14 (no "gene" line by default, and added below "CDS" line when --addgenes flag is used in prokka).

I fixed this locally for my use case, but do not have a fix for the parser that is usable for the various GFF3 formats.