Nextomics / NextPolish

Fast and accurately polish the genome generated by long reads.
GNU General Public License v3.0
205 stars 28 forks source link

How long and how much memory should NextPolish require? #90

Open kcl58759 opened 2 years ago

kcl58759 commented 2 years ago

Question or Expected behavior How long should it take for NextPolish to complete on a ~50Mb long read genome and what memory should I ask for? I submitted it at 90GB for 99hours and it timed out.

Operating system SLURM NextPolish/1.4.0-GCCcore-8.3.0-Python-3.8.2

GCC What version of GCC are you using? gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC)

Python What version of Python are you using? You can use the command python --version to get it.Python 3.8.2

moold commented 2 years ago

Hi, depends on your input and parameters, but if you prefer to use your own alignment pipeline, it will cost less resources and be faster, here

kcl58759 commented 2 years ago

Hi, I am trying to use my own alignment pipeline to decrease resources needed. Here is my alignment file:

!/bin/bash

SBATCH --job-name=NextPolishBWA

SBATCH --partition=batch

SBATCH --ntasks=5

SBATCH --cpus-per-task=10

SBATCH --mem=90gb

SBATCH --time=99:00:00

SBATCH --output=nextpolishself.out

SBATCH --error=nextpolishself.err

SBATCH --mail-user=kcl58759@uga.edu

SBATCH --mail-type=END,FAIL

module load BWA/0.7.17-GCC-8.3.0 ml SAMtools/1.10-GCC-8.3.0 ml NextPolish/1.4.0-GCCcore-8.3.0-Python-3.8.2

round=2 threads=20 read=/scratch/kcl58759/Eco_pacbio_kendall/pb_css_474/cromwell-executions/pb_ccs/c7a3dc30-7f94-40de-ac16-2445f965bfad/call-export_fasta/execution/m64060_210804_174320.hifi_reads.fasta.gz read_type=hifi mapping_option=["hifi"]="asm20" input=/scratch/kcl58759/Eco_pacbio_kendall/474.Primary.Hifi.asm/474.Primary.HiFi.asm.p_ctg.fa

for ((i=1; i<=2;i++)); do minimap2 -ax asm20 [hifi] -t 6 /scratch/kcl58759/Eco_pacbio_kendall/474.Primary.Hifi.asm/474.Primary.HiFi.asm.p_ctg.f /scratch/kcl58759/Eco_pacbio_kendall/pb_css_474/cromwell-executions/pb_ccs/c7a3dc30-7f94-40de-ac16-2445f965bfad/call-export_fasta/execution/m64060_210804_174320.hifi_reads.fasta.gz | samtools sort - -m 2g --threads 6 -o lgs.sort.bam; samtools index lgs.sort.bam; ls pwd/lgs.sort.bam > lgs.sort.bam.fofn; python NextPolish/lib/nextpolish2.py -g /scratch/kcl58759/Eco_pacbio_kendall/474.Primary.Hifi.asm/474.Primary.HiFi.asm.p_ctg.f-l lgs.sort.bam.fofn -r hifi -p 6 -sp -o genome.nextpolish.fa; if ((i!=2));then mv genome.nextpolish.fa genome.nextpolishtmp.fa; input=genome.nextpolishtmp.fa; fi; done;

However I keep getting the errors:

[ERROR] failed to open file '[hifi]': No such file or directory python: can't open file 'NextPolish/lib/nextpolish2.py': [Errno 20] Not a directory mv: cannot stat ‘genome.nextpolish.fa’: No such file or directory [ERROR] failed to open file '[hifi]': No such file or directory python: can't open file 'NextPolish/lib/nextpolish2.py': [Errno 20] Not a directory

Is there something I am missing?

moold commented 2 years ago

see minimap2 manual to checkout how to run minimap2, [hifi] is not a correct option.

kcl58759 commented 2 years ago

I believe the issue is not with minimap but with NextPolish/lib/nextpolish2.py not being available. I cannot find the script on line and it doesn't load in with ml NextPolish/1.4.0-GCCcore-8.3.0-Python-3.8.2