HKU-BAL / Clair3

Clair3 - Symphonizing pileup and full-alignment for high-performance long-read variant calling
247 stars 27 forks source link

Define temporary directory directly from Clair3 CLI #294

Closed SergeWielhouwer closed 7 months ago

SergeWielhouwer commented 7 months ago

Hi,

I have been having issues with running Clair3 v1.0.4 on a HG002 dataset from ONT. The tool seems to write quite a lot of intermediate files, such as vcf.gz, to the TMP directory which unfortunately doesn't have that much space on our HPC cluster. This results in the final merge.vcf.gz to be incomplete.

[INFO] 1/7 Call variants using pileup model parallel: Error: Output is incomplete. parallel: Error: Cannot append to buffer file in /tmp. parallel: Error: Is the disk full? parallel: Error: Change $TMPDIR with --tmpdir or use --compress. Warning: unable to close filehandle properly: No space left on device during global destruction.

I have tried both passing the TMPDIR environment variable to singularity to use with Clair3 and directly defining the parameter --tmpdir to the command, which resulted in an error (I think the --tmpdir parameter comes from a submodule/script within Clair3?). Also trying to bind the /tmp dir to another directory in Singularity didn't work as expected.

work_dir="/mnt/example/GM24385_R103_from_2020/giab_2023.05_SUP"
cd $work_dir

export TMPDIR=$PWD/tmp

mkdir -p HG002/variants_clair3 $PWD/tmp

singularity run -B /mnt --containall clair3_latest.sif /opt/bin/run_clair3.sh \
--bam_fn=PAO89685.pass.cram \
--ref_fn=GCA_000001405.15_GRCh38_no_alt_analysis_set.fna --threads=64 \
--platform="ont" \
--model_path="r1041_e82_400bps_sup_v420" \
--output=/mnt/example/GM24385_R103_from_2020/giab_2023.05_SUP/HG002/variants_clair3 \
--tmpdir=$PWD/tmp

Could someone tell me which parameter or environment variable is required to write the temp files to a directory of choice?

Thanks!

aquaskyline commented 7 months ago

Give this a try. Might need some more polishing to get it running, but the gist is adding the option --env TMPDIR=$PWD/tmp to singularity so the environment variable is set in the singularity environment.

work_dir="/mnt/example/GM24385_R103_from_2020/giab_2023.05_SUP"
cd $work_dir

mkdir -p HG002/variants_clair3 $PWD/tmp

singularity run -B /mnt --containall --env TMPDIR=$PWD/tmp clair3_latest.sif /opt/bin/run_clair3.sh \
--bam_fn=PAO89685.pass.cram \
--ref_fn=GCA_000001405.15_GRCh38_no_alt_analysis_set.fna --threads=64 \
--platform="ont" \
--model_path="r1041_e82_400bps_sup_v420" \
--output=/mnt/example/GM24385_R103_from_2020/giab_2023.05_SUP/HG002/variants_clair3 \
SergeWielhouwer commented 7 months ago

Thank you @aquaskyline, I will definitely try out the --env option. I think that --containall should already pass on all environment variables to singularity, but directly specifying this variable with --env is likely a better approach

SergeWielhouwer commented 7 months ago

Thanks again for your help. I ended up resolving the issue by also changing the home directory mount through singularity, as this was the main culprit for the out of space issues.

singularity run -B /mnt --home $PWD/home:/home --env TMPDIR=$PWD/tmp clair3_latest.sif /opt/bin/run_clair3.sh \
--bam_fn=PAO89685.pass.cram \
--ref_fn=GCA_000001405.15_GRCh38_no_alt_analysis_set.fna --threads=64 \
--platform="ont" \
--model_path="r1041_e82_400bps_sup_v420" \
--output=/mnt/example/GM24385_R103_from_2020/giab_2023.05_SUP/HG002/variants_clair3

Though I am bit worried with Clair3 still writing the final merge_output.vcf.gz file, even if many loci could not be processed properly due to the out of space issues, causing a lot of variants to be missed (See recall scores from the tool hap.py below). I am not quite sure what the exit code was for that run as I cannot fetch it from our SLURM database, but the log states "[INFO] Finish calling, output file:..." which may indicate that Clair3 did not stop once it encountered the space issues. Is there a sort of stringent mode to immediately stop once these errors occur? Or do I have to check the logs by hand or using a tool such as grep?

image

aquaskyline commented 7 months ago

@SergeWielhouwer could you please send me your log file?

SergeWielhouwer commented 7 months ago

Of course, please find the log for the incomplete run in the following link: run_clair3.log

aquaskyline commented 7 months ago

The errors like [E::cram_populate_ref] Creating reference at /home/s.wielhouwer/.cache/hts-ref/6a/ef/897c3d6ff0c78aff06ac189178dd failed: No space left on device in the log were produced by samtools. Interestingly samtools is not producing a non-zero exit code or otherwise Clair3 would capture it because we set set -e in the run_clair3.sh. It might take long for me to figure out how to handle the out of space situation better in Clair3. Currently I think the rule of thumb is to check for each run if there is any prompt saying No space in the log file.

SergeWielhouwer commented 7 months ago

That's good to know, it's a pity that samtools doesn't produce throw those error codes for Clair3 to capture. If I will automate Clair3 in a pipeline in the future, I will probably indeed try to check on No space text in the logs using grep or similar and mark the output as incomplete/invalid once found.