Kuanhao-Chao / LiftOn

🚀 LiftOn: Accurate annotation mapping for GFF/GTF across assemblies
http://ccb.jhu.edu/lifton
GNU General Public License v3.0
48 stars 1 forks source link

coordinates are off in the output #1

Closed rahulvrane closed 2 months ago

rahulvrane commented 2 months ago

Hi - thanks for this tool. Great idea and we have usually done this with SPALN and so intruing how miniprot works.

I am facing 1 issue which is probably an output issue and coordinate management.. Running the same liftover with LiftOff and LiftOn, i get errors with lifton - and thousands of genes have this issue where end is < start ..

LiftOFF output:

PRKT01001132.1 Liftoff CDS 100681 100708 . - . ID=cds-XP_061166086.1;Parent=rna-XM_061310102.1;Dbxref=GeneID:133175006,GenBank:XP_061166086.1;Name=XP_061166086.1;gbkey=CDS;gene=LOC133175006;product=F-BAR domain only protein 2-like isoform X2;protein_id=XP_061166086.1;extra_copy_number=0

LiftON output

PRKT01001132.1 LiftOn CDS 100681 100247 . - 0 ID=cds-XP_061166086.1;Parent=rna-XM_061310102.1;Dbxref=GeneID:133175006,GenBank:XP_061166086.1;Name=XP_061166086.1;gbkey=CDS;gene=LOC133175006;product=F-BAR domain only protein 2-like isoform X2;protein_id=XP_061166086.1

This will cause below type errors e.g. from gffread Error: invalid feature coordinates (end<start!)

Kuanhao-Chao commented 2 months ago

Hi @rahulvrane ,

Thank you for bringing this issue to our attention. The problem arose during the open reading frame search algorithm, where we overlooked updating the start and end points of the last and first CDS.

We've addressed this issue and have released an update, which you can access here: LiftOn v1.0.1.

If there are more issues regarding the CDS boundaries, you can share the file with me through my email: kuanhao.chao@gmail.com.

Best,

Kuan-Hao