EichlerLab / pav

Phased assembly variant caller
97 stars 8 forks source link

PAV running for > 9 days and is still running #22

Closed volcano1998 closed 1 year ago

volcano1998 commented 2 years ago

Hi dear author, I'm running PAV on our university's cluster (slurm). Library is Pacbio CLR, I gave 500G mem and 30 cpus to run it, but it is running for more than 9 days and I still can not see the result. I ran PAV on Pacbio Hifi reads assembly before and it took less than 4 hours. Any thoughts on how should I accelerate it?

Here is the tail part of the output
[Thu Sep  8 19:50:23 2022]
Finished job 221.
343 of 347 steps (99%) done
Select jobs to execute...

[Thu Sep  8 19:50:23 2022]
rule call_merge_haplotypes:
    input: temp/NA24385/bed/integrated/h1/svindel_ins.bed.gz, temp/NA24385/bed/integrated/h2/svindel_ins.bed.gz, results/NA24385/callable/callable_regions_h1_500.bed.gz, results/NA24385/callable/callable_regions_h2_500.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr1.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr10.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr11.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr11_gl000202_random.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr12.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr13.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr14.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr15.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr16.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr17.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr17_gl000203_random.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr17_gl000204_random.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr17_gl000205_random.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr17_gl000206_random.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr18.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr18_gl000207_random.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr19.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr19_gl000208_random.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr19_gl000209_random.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr1_gl000191_random.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr1_gl000192_random.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr2.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr20.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr21.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr21_gl000210_random.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr22.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr3.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr4.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr4_gl000193_random.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr4_gl000194_random.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr5.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr6.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr7.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr7_gl000195_random.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr8.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr8_gl000196_random.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr8_gl000197_random.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr9.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr9_gl000198_random.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr9_gl000199_random.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr9_gl000200_random.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chr9_gl000201_random.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrM.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000211.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000212.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000213.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000214.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000215.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000216.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000217.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000218.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000219.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000220.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000221.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000222.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000223.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000224.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000225.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000226.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000227.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000228.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000229.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000230.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000231.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000232.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000233.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000234.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000235.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000236.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000237.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000238.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000239.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000240.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000241.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000242.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000243.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000244.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000245.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000246.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000247.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000248.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrUn_gl000249.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrX.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/chrY.bed.gz, temp/NA24385/bed/bychrom/svindel_ins/hs37d5.bed.gz
    output: temp/NA24385/bed/merged/svindel_ins.bed.gz
    jobid: 3
    wildcards: asm_name=NA24385, vartype_svtype=svindel_ins
    resources: tmpdir=/tmp

I think it is stuck there for 7 days. I ran PAV on other libraries(CLR), same issue.

Any thoughts?

paudano commented 2 years ago

Something is wrong if it's stuck on step call_merge_haplotypes for that long. That rule just concatenates merged variant calls and writes them into a file. If you have output from the rule (I have snakemake push mine into a cluster log), are there any error messages? You can kill and restart the pipeline, it will pick up from where it left off unless something upstream changed (i.e. if the timestamp on the assembly FASTAs changed, the whole pipeline will re-run).

volcano1998 commented 2 years ago

Hi thank you! I'm now using another server(which I believe is more powerful) to run PAV, it finished in < 2days. So I guess it could be due to the server's capability. Now I submit the rest of PAV job to the other server too, hopefully they won't be stuck again. I will keep you updated!!

Thank you again!

paudano commented 2 years ago

Was the second run successful, or was it still hanging?

volcano1998 commented 2 years ago

Hi, after that I was busy with other stuff so did not pay much attention to the running. Just checked it: I was able to finish running 2 CLR libraries within 2 days. Still have one CLR library job which was timed out (time limit was set to 2 days), so I resubmitted the job and see how it will be.

I will keep you updated about the last one.

Thank you for checking!