Kuanhao-Chao / LiftOn

🚀 LiftOn: Accurate annotation mapping for GFF/GTF across assemblies
http://ccb.jhu.edu/lifton
GNU General Public License v3.0
59 stars 3 forks source link

gffutils error at stage "Populating features" #7

Open jolbi opened 4 months ago

jolbi commented 4 months ago

Hi, thank you for the tool, I am very excited to try it and compare the results from liftoff.

However, I am getting a gffutils error at the stage after miniprot:

>> Creating liftoff annotation database : /path_to_dir/lifton_output/liftoff/liftoff.gff3_polished
2024-05-30 12:16:44,480 - INFO - Populating features
gffutils database build failed with UNIQUE constraint failed: features.id

My command was:

lifton \
-g $ref_gff \
-o $out_dir/$asm_name."$gff_name"_lifton.gff3 \
-u $out_dir/$asm_name."$gff_name"_lifton_unmapped.txt \
-chroms $in_dir/$ref_name.chroms.txt \
-copies -polish -cds -sc 0.96 -flank 0.1\
-t $threads \
$asm \
$ref

The error suggests that some gff features don't have unique IDs. I cheked the input gff and it does not contain any duplicated IDs. I also run it through AGAT and it does not find any errors. Liftoff runs well on this gff. I am suspecting the problem is with the output features.

Kuanhao-Chao commented 4 months ago

Hi @jolbi,

Thanks for reporting this. Did the error occur when building the Liftoff annotation into the gffutils database? I believe the error might be due to some ID issues with exons and CDSs. Could you please check if there are any ID duplications for exons and CDSs?

I have pushed another commit so that if the error occurs, gffutils will attempt to build the database again using the merge_strategy "warning instead of the original create_unique. Could you please help us test if this fixes your error? You can download LiftOn again through Git. Clone the directory and run python setup.py install. This should install the latest version of LiftOn. For more details, please visit: https://ccb.jhu.edu/lifton/content/installation.html

Feel free to send me any of your files (kuanhao.chao@gmail.com), and I can take a look as well.

Best, Kuan-Hao

jolbi commented 4 months ago

Did the error occur when building the Liftoff annotation into the gffutils database?

I am not sure if you are referring to standalone Liftoff or Liftoff as part of LiftOn?

  1. Standalone Liftoff runs fine with this gff
  2. When running LiftOn error occurs after miniprot stage finishes (so Liftoff part runs fine).

Could you please check if there are any ID duplications for exons and CDSs?

I checked for duplicates again (but this time manually) and there are indeed some CDS and UTR features with duplicated IDs. They seem to be non-overlapping though (but did not check all of them). I am not sure how to interpret this features, but I see that gff3 specification allow for duplicated IDs for discontinuous features, so it may be good to allow them in LiftOn also.

I have pushed another commit so that if the error occurs, gffutils will attempt to build the database again using the merge_strategy "warning instead of the original create_unique. Could you please help us test if this fixes your error? You can download LiftOn again through Git. Clone the directory and run python setup.py install. This should install the latest version of LiftOn. For more details, please visit: https://ccb.jhu.edu/lifton/content/installation.html

I tried to install the latest commit, but I am getting error: Couldn't find a setup script in /tmp/easy_install-liu654lb/numpy-2.0.0rc2.tar.gz when running python setup.py install. I don't have much experience with installing from source, so it may be some stupid mistake.

I sent the files to your email.

Cheers, Tim