Closed SergeWielhouwer closed 7 months ago
Give this a try. Might need some more polishing to get it running, but the gist is adding the option --env TMPDIR=$PWD/tmp
to singularity so the environment variable is set in the singularity environment.
work_dir="/mnt/example/GM24385_R103_from_2020/giab_2023.05_SUP"
cd $work_dir
mkdir -p HG002/variants_clair3 $PWD/tmp
singularity run -B /mnt --containall --env TMPDIR=$PWD/tmp clair3_latest.sif /opt/bin/run_clair3.sh \
--bam_fn=PAO89685.pass.cram \
--ref_fn=GCA_000001405.15_GRCh38_no_alt_analysis_set.fna --threads=64 \
--platform="ont" \
--model_path="r1041_e82_400bps_sup_v420" \
--output=/mnt/example/GM24385_R103_from_2020/giab_2023.05_SUP/HG002/variants_clair3 \
Thank you @aquaskyline, I will definitely try out the --env
option. I think that --containall should already pass on all environment variables to singularity, but directly specifying this variable with --env
is likely a better approach
Thanks again for your help. I ended up resolving the issue by also changing the home directory mount through singularity, as this was the main culprit for the out of space issues.
singularity run -B /mnt --home $PWD/home:/home --env TMPDIR=$PWD/tmp clair3_latest.sif /opt/bin/run_clair3.sh \
--bam_fn=PAO89685.pass.cram \
--ref_fn=GCA_000001405.15_GRCh38_no_alt_analysis_set.fna --threads=64 \
--platform="ont" \
--model_path="r1041_e82_400bps_sup_v420" \
--output=/mnt/example/GM24385_R103_from_2020/giab_2023.05_SUP/HG002/variants_clair3
Though I am bit worried with Clair3 still writing the final merge_output.vcf.gz file, even if many loci could not be processed properly due to the out of space issues, causing a lot of variants to be missed (See recall scores from the tool hap.py below). I am not quite sure what the exit code was for that run as I cannot fetch it from our SLURM database, but the log states "[INFO] Finish calling, output file:..." which may indicate that Clair3 did not stop once it encountered the space issues. Is there a sort of stringent mode to immediately stop once these errors occur? Or do I have to check the logs by hand or using a tool such as grep?
@SergeWielhouwer could you please send me your log file?
Of course, please find the log for the incomplete run in the following link: run_clair3.log
The errors like [E::cram_populate_ref] Creating reference at /home/s.wielhouwer/.cache/hts-ref/6a/ef/897c3d6ff0c78aff06ac189178dd failed: No space left on device
in the log were produced by samtools. Interestingly samtools is not producing a non-zero exit code or otherwise Clair3 would capture it because we set set -e
in the run_clair3.sh. It might take long for me to figure out how to handle the out of space situation better in Clair3. Currently I think the rule of thumb is to check for each run if there is any prompt saying No space
in the log file.
That's good to know, it's a pity that samtools doesn't produce throw those error codes for Clair3 to capture. If I will automate Clair3 in a pipeline in the future, I will probably indeed try to check on No space
text in the logs using grep or similar and mark the output as incomplete/invalid once found.
Hi,
I have been having issues with running Clair3 v1.0.4 on a HG002 dataset from ONT. The tool seems to write quite a lot of intermediate files, such as vcf.gz, to the TMP directory which unfortunately doesn't have that much space on our HPC cluster. This results in the final merge.vcf.gz to be incomplete.
[INFO] 1/7 Call variants using pileup model parallel: Error: Output is incomplete. parallel: Error: Cannot append to buffer file in /tmp. parallel: Error: Is the disk full? parallel: Error: Change $TMPDIR with --tmpdir or use --compress. Warning: unable to close filehandle properly: No space left on device during global destruction.
I have tried both passing the TMPDIR environment variable to singularity to use with Clair3 and directly defining the parameter --tmpdir to the command, which resulted in an error (I think the --tmpdir parameter comes from a submodule/script within Clair3?). Also trying to bind the /tmp dir to another directory in Singularity didn't work as expected.
Could someone tell me which parameter or environment variable is required to write the temp files to a directory of choice?
Thanks!