agshumate / Liftoff

An accurate GFF3/GTF lift over pipeline
GNU General Public License v3.0
410 stars 50 forks source link

GFF does not contain any gene features. Use -f to provide a list of other feature types to lift over. #166

Open zgb963 opened 3 months ago

zgb963 commented 3 months ago

Hello,

I'm using hg38 GTF from NCBI and ran liftoff with the Rhesus macaque FASTA and hg38 FASTA.

liftoff liftoff/rheMac10.fa liftoff/hg38.fa -g liftoff/hg38.ncbiRefSeq.gtf -p 10 -o liftoff/update_rhemac10_lifted.gtf

Here is the bsub command I used to run my script

bsub -q long -R rusage[mem=7G] -R span[hosts=1] -W 96:00 -n 32 -o ~/macaque_snRNAseq/liftoff/my_out.%J -e ~/macaque_snRNAseq/liftoff/my_err.%J ~/macaque_snRNAseq/scripts/new_update_liftoff.sh

And here is the script

#!/bin/bash

#activate liftoff
conda activate liftoff

#Test to see if the sqlite errors are related to the NAS, create dir on /tmp, copy data, and use that as the working space
MYTMP=`mktemp -d`
cp -p liftoff/rheMac10.fa $MYTMP/rheMac10.fa
cp -p liftoff/hg38.fa $MYTMP/hg38.fa
cp -p liftoff/hg38.ncbiRefSeq.gtf $MYTMP/hg38.ncbiRefSeq.gtf

# run liftoff using the files in $MYTMP
liftoff $MYTMP/rheMac10.fa $MYTMP/hg38.fa -g $MYTMP/hg38.ncbiRefSeq.gtf -p 32 -o $MYTMP/new_update_rhemac10_lifted.gtf
#cp output from $MYTMP to the liftoff directory
cp -p $MYTMP/new_update_rhemac10_lifted.gtf liftoff/new_update_rhemac10_lifted.gtf

#assuming everything works, going forward will want to clean up the tmpdir
#uncomment when ready
#rm -rf $MYTMP

echo liftoff finished running!

But I keep getting the following error after ~10 minutes.

Populating features table and first-order relations: 4883000 features
Populating features table and first-order relations: 4884000 features
Populating features table and first-order relations: 4885000 features
Populating features table and first-order relations: 4886000 features
2024-03-26 12:53:43,569 - INFO - Committing changes
2024-03-26 12:53:44,350 - INFO - Populating features table and first-order relations: 4886701 features
2024-03-26 12:53:44,351 - INFO - Creating relations(parent) index
2024-03-26 12:53:48,243 - INFO - Creating relations(child) index
2024-03-26 12:53:52,411 - INFO - Creating features(featuretype) index
2024-03-26 12:53:54,440 - INFO - Creating features (seqid, start, end) index
2024-03-26 12:53:57,586 - INFO - Creating features (seqid, start, end, strand) index
2024-03-26 12:54:01,020 - INFO - Running ANALYZE features
GFF does not contain any gene features. Use -f to provide a list of other feature types to lift over.

Has anyone run into this error and know how to fix it? I'm running it in an HPC environment on a computer node that has 2000 cores

I've tried using a different human genome annotation in gff format from NCBI but then it gets stuck and takes forever to run. I've also tried the human genome annotation from Ensembl and it does the same thing.

2024-02-07 12:10:24,951 - INFO - Running ANALYZE features
[M::main::23.219*0.29] loaded/built the index for 2939 target sequence(s)
[M::mm_mapopt_update::24.492*0.33] mid_occ = 596
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 2939
[M::mm_idx_stat::25.276*0.35] distinct minimizers: 101324913 (39.04% are singletons); average occurrences: 5.469; average spacing: 5.362; total length: 2971331530
[M::worker_pipeline::25.475*0.35] mapped 7 sequences
[M::main] Version: 2.26-r1175
[M::main] CMD: minimap2 -o intermediate_files/reference_all_to_target_all.sam -a --end-bonus 5 --eqx -N 50 -p 0.5 -t 10 liftoff/rheMac10.fa.gz.mmi intermediate_files/reference_all_genes.fa
[M::main] Real time: 25.729 sec; CPU: 9.256 sec; Peak RSS: 7.697 GB
jiehua1995 commented 3 months ago

Hi,

I think that you need to try

liftoff -g liftoff/hg38.ncbiRefSeq.gtf -p 10 -o liftoff/update_rhemac10_lifted.gtf liftoff/rheMac10.fa liftoff/hg38.fa

or

liftoff \
-g liftoff/hg38.ncbiRefSeq.gtf \
-p 10 \
-o liftoff/update_rhemac10_lifted.gtf \
liftoff/rheMac10.fa liftoff/hg38.fa

Just put the target and reference after all options.

zgb963 commented 2 months ago

Hi,

I think that you need to try

liftoff -g liftoff/hg38.ncbiRefSeq.gtf -p 10 -o liftoff/update_rhemac10_lifted.gtf liftoff/rheMac10.fa liftoff/hg38.fa

or

liftoff \
-g liftoff/hg38.ncbiRefSeq.gtf \
-p 10 \
-o liftoff/update_rhemac10_lifted.gtf \
liftoff/rheMac10.fa liftoff/hg38.fa

Just put the target and reference after all options.

Hi @jiehua1995 thanks for your suggestion. I tried it but still got the following error after a few minutes

GFF does not contain any gene features. Use -f to provide a list of other feature types to lift over.
haydenji0731 commented 1 month ago

It seems like your input reference annotation doesn't contain gene features. If this is the case, try adding -infer_genes option.

jdmontenegro commented 3 weeks ago

Interestingly, "-infer_genes" is order dependent and it has to appear in the command line before any "mm2_options" are added.