Kuanhao-Chao / LiftOn

🚀 LiftOn: Accurate annotation mapping for GFF/GTF across assemblies
http://ccb.jhu.edu/lifton
GNU General Public License v3.0
48 stars 1 forks source link

Weird formatting of column 9 in gff3 lift-over #8

Open TDDB-limagrain opened 1 month ago

TDDB-limagrain commented 1 month ago

Hi @Kuanhao-Chao , I was able to properly run Lifton using one plant reference genome and a new one to annotate from the same species. The command was:

lifton -g ref.gff3 -o liftover.gff3 -P ref.pep.fasta -copies -sc 0.95 newgenome.fasta refgenome.fasta

The resulting lift-over file looks quite good for many gene models, but for some of them, there is a duplication in the exon names. See below the example a gene with 8 exons in the reference genome.

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

newgenomeLG00 | LiftOn | gene | 279378 | 285818 | . | + | . | ID=newgenomeLG00g00180;Name=newgenomeLG00g00180;source=Liftoff --|--|--|--|--|--|--|--|-- newgenomeLG00 | LiftOn | mRNA | 279378 | 285818 | . | + | . | ID=newgenomeLG00g00180.1;Parent=newgenomeLG00g00180;Name=newgenomeLG00g00180.1;mutation=frameshift;protein_identity=0.999;dna_identity=1.000;status=LiftOn_chaining_anewgenomeLGorithm newgenomeLG00 | LiftOn | exon | 279378 | 280607 | . | + | . | ID=_newgenomeLG00_1g00140.1:exon:001;Parent=newgenomeLG00g00180.1 newgenomeLG00 | LiftOn | exon | 281921 | 282006 | . | + | . | ID=_newgenomeLG00_1g00140.1:exon:001;Parent=newgenomeLG00g00180.1 newgenomeLG00 | LiftOn | exon | 282142 | 282299 | . | + | . | ID=_newgenomeLG00_1g00140.1:exon:001;Parent=newgenomeLG00g00180.1 newgenomeLG00 | LiftOn | exon | 282401 | 283715 | . | + | . | ID=_newgenomeLG00_1g00140.1:exon:001;Parent=newgenomeLG00g00180.1 newgenomeLG00 | LiftOn | exon | 283912 | 284186 | . | + | . | ID=_newgenomeLG00_1g00140.1:exon:001;Parent=newgenomeLG00g00180.1 newgenomeLG00 | LiftOn | exon | 284379 | 284585 | . | + | . | ID=_newgenomeLG00_1g00140.1:exon:001;Parent=newgenomeLG00g00180.1 newgenomeLG00 | LiftOn | exon | 284668 | 284824 | . | + | . | ID=_newgenomeLG00_1g00140.1:exon:001;Parent=newgenomeLG00g00180.1 newgenomeLG00 | LiftOn | exon | 285132 | 285818 | . | + | . | ID=_newgenomeLG00_1g00140.1:exon:008;Parent=newgenomeLG00g00180.1 newgenomeLG00 | LiftOn | CDS | 280585 | 280607 | . | + | 0 | Parent=newgenomeLG00g00180.1 newgenomeLG00 | LiftOn | CDS | 281921 | 282006 | . | + | 1 | Parent=newgenomeLG00g00180.1 newgenomeLG00 | LiftOn | CDS | 282142 | 282299 | . | + | 2 | Parent=newgenomeLG00g00180.1 newgenomeLG00 | LiftOn | CDS | 282401 | 283715 | . | + | 0 | Parent=newgenomeLG00g00180.1 newgenomeLG00 | LiftOn | CDS | 283912 | 284186 | . | + | 2 | Parent=newgenomeLG00g00180.1 newgenomeLG00 | LiftOn | CDS | 284379 | 284585 | . | + | 0 | Parent=newgenomeLG00g00180.1 newgenomeLG00 | LiftOn | CDS | 284668 | 284824 | . | + | 0 | Parent=newgenomeLG00g00180.1 newgenomeLG00 | LiftOn | CDS | 285132 | 285430 | . | + | 2 | Parent=newgenomeLG00g00180.1

In addition to that, would it be possible to automatically add a unique ID to each CDS? this can be mandatory for downstream applications.

Thanks!