agshumate / Liftoff

An accurate GFF3/GTF lift over pipeline
GNU General Public License v3.0
427 stars 52 forks source link

help with error message from running liftoff #169

Open amberdeja opened 4 months ago

amberdeja commented 4 months ago

Hello, I am trying to annotate an assembly and I get this error message for which I can't find any information. Could I get an idea of what is wrong? Thank you, Alejandra

2024-06-06 00:50:47,689 - INFO - Committing changes: 1866000 features 2024-06-06 00:50:48,490 - INFO - Populating features table and first-order relations: 1866717 features 2024-06-06 00:50:48,685 - INFO - Creating relations(parent) index 2024-06-06 00:50:56,517 - INFO - Creating relations(child) index 2024-06-06 00:51:04,802 - INFO - Creating features(featuretype) index 2024-06-06 00:51:07,152 - INFO - Creating features (seqid, start, end) index 2024-06-06 00:51:09,077 - INFO - Creating features (seqid, start, end, strand) index 2024-06-06 00:51:11,062 - INFO - Running ANALYZE features Traceback (most recent call last): File "/c4/home/amartinez/.conda/envs/pacbio2/bin/liftoff", line 10, in sys.exit(main()) File "/c4/home/amartinez/.conda/envs/pacbio2/lib/python3.10/site-packages/liftoff/run_liftoff.py", line 12, in main run_all_liftoff_steps(args) File "/c4/home/amartinez/.conda/envs/pacbio2/lib/python3.10/site-packages/liftoff/run_liftoff.py", line 24, in run_all_liftoff_steps feature_db, feature_hierarchy, ref_parent_order = liftover_types.lift_original_annotation(ref_chroms, target_chroms, File "/c4/home/amartinez/.conda/envs/pacbio2/lib/python3.10/site-packages/liftoff/liftover_types.py", line 15, in lift_original_annotation align_and_lift_features(ref_chroms, target_chroms, args, feature_hierarchy, liftover_type, unmapped_features, File "/c4/home/amartinez/.conda/envs/pacbio2/lib/python3.10/site-packages/liftoff/liftover_types.py", line 23, in align_and_lift_features aligned_segments= align_features.align_features_to_target(ref_chroms, target_chroms, args, File "/c4/home/amartinez/.conda/envs/pacbio2/lib/python3.10/site-packages/liftoff/align_features.py", line 16, in align_features_to_target target_fasta_dict = split_target_sequence(target_chroms, args.target, args.dir) File "/c4/home/amartinez/.conda/envs/pacbio2/lib/python3.10/site-packages/liftoff/align_features.py", line 32, in split_target_sequence Faidx(target_fasta_name) File "/c4/home/amartinez/.conda/envs/pacbio2/lib/python3.10/site-packages/pyfaidx/init.py", line 521, in init self.read_fai() File "/c4/home/amartinez/.conda/envs/pacbio2/lib/python3.10/site-packages/pyfaidx/init.py", line 577, in read_fai raise ValueError('Duplicate key "%s"' % key) ValueError: Duplicate key "h2tg002731l"

haydenji0731 commented 3 months ago

Hello, based on the error message, it looks like your target sequence (i.e., the assembly you are annotating) contains multiple sequences with the same ID (e.g., "h2tg002731|"). The sequence ID seems truncated (right at where | is placed) so it's most likely that there are sequences IDs that start with the same prefix. Can you try assigning unique IDs to each of your target sequences and avoid using special characters like '|'? dots are usually fine.