AbeelLab / ptolemy

GNU General Public License v3.0
10 stars 2 forks source link

Ptolemy extract error for contigs with only one CDS #12

Open nbat64 opened 4 days ago

nbat64 commented 4 days ago

Hello,

I am trying to use Ptolemy extract (in addition to Panaroo for Bacterial Pangenome) but I have an error when I am running the following command on my gff3 file obtained with Bakta:

java -jar ptolemy.jar extract --genomes tet;txt -o ./ --use-cds --verbose 

The problem seems to come when a contig in a gff has only one CDS:

##sequence-region contig_86 1 4405
contig_86   Bakta   region  1   4405    .   +   .   ID=contig_86;Name=contig_86
contig_86   Prodigal    CDS 3782    4360    .   -   0   ID=KJABOA_25760;Name=phosphopantetheine-binding protein;locus_tag=KJABOA_25760;product=phosphopantetheine-binding protein;Dbxref=SO:0001217,UniRef:UniRef50_A0A1Q4NTK3,UniRef:UniRef90_UPI0019D6052E

The error message is:

(Mon Nov 25 17:32:57 CET 2024): Found 1 genome entries
(Mon Nov 25 17:32:57 CET 2024): --Verifying paths
(Mon Nov 25 17:32:57 CET 2024): --test
(Mon Nov 25 17:32:57 CET 2024): ----Found 265 ORFs
(Mon Nov 25 17:32:57 CET 2024): ----Processing sequence contig_1
(Mon Nov 25 17:32:57 CET 2024): ----Curated 263 ORFs and 180 intergenic sequences
(Mon Nov 25 17:32:58 CET 2024): ----Processing sequence contig_86
Exception in thread "main" java.util.NoSuchElementException: head of empty list
        at scala.collection.immutable.Nil$.head(List.scala:428)
        at scala.collection.immutable.Nil$.head(List.scala:425)
        at utilities.GFFutils._parseORFs$1(GFFutils.scala:264)
        at utilities.GFFutils.parseORFs(GFFutils.scala:302)
        at utilities.GFFutils.parseORFs$(GFFutils.scala:131)
        at build_db.Extract$.parseORFs(Extract.scala:19)
        at build_db.Extract$.$anonfun$extract$4(Extract.scala:97)
        at build_db.Extract$.$anonfun$extract$4$adapted(Extract.scala:97)
        at build_db.Extract$.$anonfun$extract$10(Extract.scala:170)
        at build_db.Extract$.$anonfun$extract$10$adapted(Extract.scala:142)
        at scala.collection.immutable.List.foreach(List.scala:389)
        at build_db.Extract$.extract(Extract.scala:142)
        at build_db.Extract$.$anonfun$main$1(Extract.scala:69)
        at build_db.Extract$.$anonfun$main$1$adapted(Extract.scala:65)
        at scala.Option.map(Option.scala:146)
        at build_db.Extract$.main(Extract.scala:65)
        at cli.Ptolemy$.main(Ptolemy.scala:34)
        at cli.Ptolemy.main(Ptolemy.scala)

Any idea how to fix the problem without modify the function _parseORFs in the GFFutils.scala file?

Thanks

Regards

Nicolas

thomasabeel commented 2 days ago

Simplest would probably to remove any entries from contig_86 from the gff and sequence file.