PacificBiosciences / kineticsTools

Tools for detecting DNA modifications from single molecule, real-time sequencing data
19 stars 21 forks source link

ipdSummary takes significant time but at a low CPU use #96

Open ck-theory opened 1 year ago

ck-theory commented 1 year ago

Hello,

I am currently using the following commands to run ipdSummary on our compute server and it is taking some samples 6+ days to run without completing. We have 2TB of RAM and 130 CPUs so I do not believe there is a resource limitation. I have the command set to run on 20 threads and I can see from the log files that for many of the running jobs, about ~15 of 20 threads have exited: "Process KineticWorkerProcess-9 (PID=1919193) done; exiting." However the few remaining processes are struggling along and only using a few CPUs. For example, I have 6 jobs running on 120 total CPUs, but only 47 are in use. During the assembly (Flye) all 120 CPUs were in use so it is not an issue with our queuing system.

Could you look at my ipdSummary command and give some pointers on ways to speed up this process? Any and all help is appreciated here! Thanks

  conda activate pbbam-2.1.0

  ccs-kinetics-bystrandify ${WD}/${SAMPLE}/pacbio/${SAMPLE}_pacbio.bam ${WD}/${SAMPLE}/kinetics/${SAMPLE}_pacbio_kinetics.bam

  conda activate smrtlink_11.0.0.146107
  cp ${WD}/${SAMPLE}/bakta/06.fixstart.fna ${WD}/${SAMPLE}/kinetics/06.fixstart.fasta
  dataset create --generateIndices ${WD}/${SAMPLE}/kinetics/${SAMPLE}_referenceset.xml ${WD}/${SAMPLE}/kinetics/06.fixstart.fasta

  pbmm2 align --sort ${WD}/${SAMPLE}/kinetics/${SAMPLE}_pacbio_kinetics.bam \
    ${WD}/${SAMPLE}/kinetics/${SAMPLE}_referenceset.xml \
    ${WD}/${SAMPLE}/kinetics/${SAMPLE}_ref_alignment.bam

  pbindex ${WD}/${SAMPLE}/kinetics/${SAMPLE}_ref_alignment.bam

  ipdSummary --numWorkers ${THREADS} \
    --reference ${WD}/${SAMPLE}/kinetics/06.fixstart.fasta \
    --gff ${WD}/${SAMPLE}/kinetics/${SAMPLE}_all_base_modifications.gff3 \
    --bigwig ${WD}/${SAMPLE}/kinetics/${SAMPLE}_all_base_modifications.bigwig \
    --csv ${WD}/${SAMPLE}/kinetics/${SAMPLE}_all_base_modifications.csv \
    --identify m6A,m4C,m5C_TET \
    ${WD}/${SAMPLE}/kinetics/${SAMPLE}_ref_alignment.bam