agshumate / Liftoff

An accurate GFF3/GTF lift over pipeline
GNU General Public License v3.0
435 stars 54 forks source link

Problems with running Liftoff on HPC environment #162

Open zgb963 opened 9 months ago

zgb963 commented 9 months ago

Hello,

I've been having issues running Liftoff. It's taking days to run and then terminates. I'm running it on an HPC environment using 100GB memory and a computer node that has 2000 cores. The below command is what I'm using to run liftoff. The target genome is rhemac10 FASTA and I've also inputed the human genome hg38 FASTA and human genome annotation GFF.

liftoff liftoff/rheMac10.fa.gz liftoff/GRCh38_latest_genomic.fna.gz -g liftoff/GRCh38_latest_genomic.gff.gz -p 32 -o liftoff/update_rhemac10_lifted.gtf

Here is the bsub command I used to submit my script

bsub -q long -R rusage[mem=25G] -R span[hosts=1] -W 96:00 -n 4 -o ~/macaque_snRNAseq/liftoff/my_out.%J -e ~/macaque_snRNAseq/liftoff/my_err.%J ~/macaque_snRNAseq/scripts/update_liftoff.sh 

And here is my script

#!/bin/bash

#activate liftoff
conda activate liftoff

#run liftoff

liftoff liftoff/rheMac10.fa.gz liftoff/GRCh38_latest_genomic.fna.gz -g liftoff/GRCh38_latest_genomic.gff.gz  -p 10 -o liftoff/update_rhemac10_lifted.gtf

echo liftoff finished running!

However, it has been running for several days and it's stuck on lifting features.

extracting features 2024-01-23 11:57:09,016 - INFO - Populating features 2024-01-23 12:04:20,319 - INFO - Populating features table and first-order relations: 4900134 features 2024-01-23 12:04:20,319 - INFO - Updating relations 2024-01-23 12:05:01,905 - INFO - Creating relations(parent) index 2024-01-23 12:05:05,589 - INFO - Creating relations(child) index 2024-01-23 12:05:10,210 - INFO - Creating features(featuretype) index 2024-01-23 12:05:14,158 - INFO - Creating features (seqid, start, end) index 2024-01-23 12:05:19,103 - INFO - Creating features (seqid, start, end, strand) index 2024-01-23 12:05:24,253 - INFO - Running ANALYZE features aligning features [M::main::16.3110.41] loaded/built the index for 2939 target sequence(s) [M::mm_mapopt_update::17.5900.45] mid_occ = 596 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 2939 [M::mm_idx_stat::18.3710.48] distinct minimizers: 101324913 (39.04% are singletons); average occurrences: 5.469; average spacing: 5.362; total length: 2971331530 [M::worker_pipeline::226.3593.67] mapped 10628 sequences [M::worker_pipeline::382.8163.79] mapped 10362 sequences [M::worker_pipeline::555.9683.84] mapped 12280 sequences [M::worker_pipeline::711.785*3.85] mapped 14834 sequences [M::main] Version: 2.26-r1175 [M::main] CMD: minimap2 -o intermediate_files/reference_all_to_target_all.sam -a --end-bonus 5 --eqx -N 50 -p 0.5 -t 32 liftoff/rheMac10.fa.gz.mmi intermediate_files/reference_all_genes.fa [M::main] Real time: 712.151 sec; CPU: 2743.497 sec; Peak RSS: 27.401 GB lifting feature

Am I using enough memory or cores/threads for liftoff? Is there a typical runtime for lifting over features from one large genome to another?

yeeus commented 8 months ago

I also encountered this problem, have you solved it?

Agamoni commented 8 months ago

Hi, I'm also having the same issue; any advice?

zgb963 commented 7 months ago

I also encountered this problem, have you solved it?

@yeeus not yet, I heard from someone that liftoff needs to be run with a gtf file and not a gff file. So I tried that but I got the following error 'GFF does not contain any gene features. Use -f to provide a list of other feature types to lift over.'

salzberg commented 5 months ago

We'll look into this - but Liftoff usually runs in no more than an hour or two on a mammalian genome, so if it's running for many hours something is wrong. It doesn't need that much memory. However it seems you are lifting human annotation onto Rhesus macaque, which is pretty distant from human (at the DNA level). This means that minimap2 will likely have trouble mapping many genes. You might instead try our newer LiftOn program, which is designed for more distant mapping problems. It uses Liftoff as a module, and also miniprot. Check it out here: https://github.com/Kuanhao-Chao/LiftOn/blob/main/README.md https://github.com/Kuanhao-Chao/LiftOn/blob/main/README.md