EichlerLab / pav

Phased assembly variant caller
98 stars 8 forks source link

PAV stopping with Hangup message #58

Open riyasj327 opened 2 months ago

riyasj327 commented 2 months ago

Greetings,

Thanks for this amazing tool. Just wondering if you know what's happening here? This happens after running for sometime like maybe an hour and sometimes I have to run it again and again to finish the whole run. It works sometimes but sometimes it just stops with this error. I am running PAV within a snakemake pipeline myself. Any thoughts on this would be helpful.

Error message: /usr/bin/bash: line 32: 305842 Hangup /usr/bin/singularity run --bind "$(pwd):$(pwd)" library://becklab/pav/pav:latest -c 36 Full Traceback (most recent call last): File "/projects/rsaju_prj/miniconda3/envs/snakemake-new/lib/python3.12/site-packages/snakemake/executors/local.py", line 420, in run_wrapper run( File "/projects/rsaju_prj/consistency/snakemake-pipeline/Snakefile", line 805, in __rule_pav_hg38 File "/projects/rsaju_prj/miniconda3/envs/snakemake-new/lib/python3.12/site-packages/snakemake/shell.py", line 297, in new raise sp.CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'set -euo pipefail;

thanks, Riya

paudano commented 2 months ago

"Hangup" is probably signal SIGHUP being generated if the shell running PAV exits. Is the shell exiting for some reason?

If it's running in an SSH session, maybe starting a "screen" first (e.g. "screen" to start, log back in and "screen -ls" to list screens and "screen -r " to resume a screen).

riyasj327 commented 2 months ago

hmm I don't think so. Sometimes it showed the error while running within a snakemake pipeline. Thanks!

paudano commented 1 month ago

Do you have more of the log file or error output to look at? I don't remember seeing it myself, but I think this type of error can appear after another error causes the command to fail, so I think we are might be missing root cause.

Also, there should be more of the command it failed on below the line that starts with "subprocess.CalledProcessError", can you share that?

riyasj327 commented 1 month ago

[Tue Sep 17 18:11:56 2024] localrule pav_hg38: input: output/GM24385-2/hapdup/flye/hapdup_dual_1.fasta, output/GM24385-2/hapdup/flye/hapdup_dual_2.fasta output: output/GM24385-2/pav-hg38/flye/pav_GM24385-2.vcf.gz log: log/GM24385-2/pav-hg38_flye.log jobid: 18 reason: Missing output files: output/GM24385-2/pav-hg38/flye/pav_GM24385-2.vcf.gz wildcards: asm_name=GM24385-2, step=flye threads: 36 resources: tmpdir=/tmp, mem_mb=256000, mem_mib=244141, cpus=36

Full Traceback (most recent call last): File "/projects/rsaju_prj/miniconda3/envs/snakemake-new/lib/python3.12/site-packages/snakemake/executors/local.py", line 420, in run_wrapper run( File "/projects/rsaju_prj/consistency/snakemake-pipeline/Snakefile", line 805, in __rule_pav_hg38 File "/projects/rsaju_prj/miniconda3/envs/snakemake-new/lib/python3.12/site-packages/snakemake/shell.py", line 297, in new raise sp.CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'set -euo pipefail;
mkdir -p output/GM24385-1/pav-hg38/flye/assemblies

    cp output/GM24385-1/hapdup/flye/hapdup_dual_1.fasta output/GM24385-1/hapdup/flye/hapdup_dual_2.fasta output/GM24385-1/pav-hg38/flye/assemblies
    cp -r  hg38_no_alt/ output/GM24385-1/pav-hg38/flye/
    cp pav-hg38/config.json output/GM24385-1/pav-hg38/flye/

    # Extract the header
    header=$(head -n 1 assemblies.tsv)

    # Extract the row that matches the asm_name
    row=$(grep -P "^"GM24385-1" " assemblies.tsv)

    # Write the header and the row to the new file
    if [ -n "$row" ]; then
        echo -e "$header

$row" > output/GM24385-1/pav-hg38/flye/assemblies.tsv echo "File created successfully: output/GM24385-1/pav-hg38/flye/assemblies.tsv" else echo "No entry found for asm_name: $asm_name" fi

    cd output/GM24385-1/pav-hg38/flye
    /usr/bin/singularity run --bind "$(pwd):$(pwd)" library://becklab/pav/pav:latest -c 36
    cd ../../../../' returned non-zero exit status 129.

Here is the latest error I got using PAV. But telling that sometimes it works just fine without any interuptions and sometimes if I run it again and again after interuption, it resumes from where it left and completes it. Please let me know if you need any more info. Thanks!

paudano commented 1 month ago

I think the job this is running is needs more memory. I've seen exit status 129 show up in those cases, and it could cause a SIGHUP depending on what Linux terminates when memory limits are exceeded. Can you try allocating more memory to the job?

riyasj327 commented 3 weeks ago

Yes will do that! Sometimes I see that it stops without giving any error messages and when I re-run it starts from the beginning probably because the required intermediate files are getting deleted if not complete or something? It would be great if it can run from where it stopped. I was using nohup for running, will try using screen. Thanks!