NCBI-Hackathons / NovoGraph

NovoGraph: building whole genome graphs from long-read-based de novo assemblies
MIT License
44 stars 8 forks source link

CALLMAFFT.pl for torque6.1.12 #18

Closed JYLeeBioinfo closed 5 years ago

JYLeeBioinfo commented 5 years ago

Hello!

Is it okay to change the following line in CALLMAFFT.pl?

from

#PBS -J ${minJobID}-${maxJobID}

to

#PBS -t ${minJobID}-${maxJobID}

I tried using the changed CALLMAFFT.pl with the following command

time perl ${novograph}/CALLMAFFT.pl --action kickOff --mafftDirectory PM-AU-0011-N-A1_q7-nanofilt_adptrim.fastq.ctg.fa_novograph-02-forMAFFT_convert2 --qsub 1 \
                  --PBSPro 1 --PBSPro_select 'nodes=1:ppn=16,mem=48gb' --PBSPro_A root  --chunkSize 500 \
                  --mafft_executable ${mafft} \
                  --fas2bam_path ${novograph}/fas2bam.pl --samtools_path /appl/samtools/samtools-1.9/samtools --bamheader ${novograph_base}/windowbam.header.txt``` 

and this seemed to be working at first since a Job array showed up in my qstat output.

but most of the queues failed with the following error.

expr: syntax error File /data2/hd00ljy/data_processing/PMI_Nanopore_WGS/7_analysis_jy/analysis_jy/PM-AU-0011-T-A1/Sniffles/PM-AU-0011-T-A1/test-cluster-length-mapq-distance-depth/over3000bp_mapq30_1000bp-dist_4support_tra-to -bnd.vcf/test_graph_genome/minimap2/PM-AU-0011-N-A1_q7-nanofilt_adptrim.fastq.ctg.fa_novograph-02-forMAFFT_convert2/chr22_KI270734v1_random/chr22_KI270734v1_random_12.mfa does not contain any sequences. at /data/hd00ljy/data_processing/PMI_Nanopore_WGS/tools/11_graph_genome/novograph/NovoGraph/scripts/CALLMAFFT.pl line 557.

nhansen commented 5 years ago

hd00ljy, when you remove the -J line from CALLMAFFT.pl, the resulting array job doesn't set $PBS_ARRAY_INDEX, which is why you get the "expo:syntax error" error. What error were you getting before you changed CALLMAFFT.pl?

nhansen commented 5 years ago

Just noticed you are using torque, which is probably the reason for the lack of support for -J and $PBS_ARRAY_INDEX. It should help to change $PBS_ARRAY_INDEX to $PBS_ARRAYID in the following lines of CALLMAFFT.pl:

if($PBSPro)
        {
        print QSUB qq(#!/bin/bash
#PBS -l $PBSPro_select
#PBS -l walltime=23:00:00
#PBS -A '$PBSPro_A'
#PBS -N CALLMAFFT
#PBS -J ${minJobID}-${maxJobID}
#PBS -r y
jobID=\$(expr \$PBS_ARRAY_INDEX - 1)
);      
        }
JYLeeBioinfo commented 5 years ago

Hello! Thank you for the comment

I changed things as you said and the run was successful! 1) change -J to -t 2) change $PBS_ARRAY_INDEX to $PBS_ARRAYID

if($PBSPro)
                {
                print QSUB qq(#!/bin/bash
#PBS -l $PBSPro_select
#PBS -l walltime=23:00:00
#PBS -A '$PBSPro_A'
#PBS -N CALLMAFFT
#PBS -t ${minJobID}-${maxJobID}
#PBS -r y

jobID=\$(expr \$PBS_ARRAYID  - 1)
);

I thereafter examined the result file with the --action check option and saw that all jobs were successfully finished.

Total found files: 310520
With MFA: 310520
With BAM: 310520

Would now redo: 0

Thank you again for your prompt reply!

Jinyoung

nhansen commented 5 years ago

Great! @evanbiederstedt and @AlexanderDilthey, now that Jinyoung has tested, I could add --torque options to CALLMAFFT.pl if you think it's worthwhile...

evanbiederstedt commented 5 years ago

Hi @nhansen

Great to hear from you! Apologies it's taken me so long to get involved in this issue.

I could add --torque options to CALLMAFFT.pl if you think it's worthwhile...

Sure! Please create a pull request and we can merge those changes in :)

evanbiederstedt commented 5 years ago

I think this was resolved here: https://github.com/NCBI-Hackathons/NovoGraph/pull/19

Thank you @hd00ljy and @nhansen, and I'm sorry for the delays. Please feel free to re-open if there are remaining issues.

Thanks, Evan