I ran into an odd bug using CheckM today, where a no space left on device error was not caught and CheckM reported wrong output with exit code 0. I think this is a bit of an edge cases as we have run CheckM on thousands of genomes the last years and this is the first time I'm seeing this, but it did catch us off guard today.
We run CheckM within a single rule of our Snakemake-managed workflow. Exit codes >0 are caught and propagated, so that the whole pipeline run is aborted (which we check to monitor technical failures). Today, a colleague using the results of the pipeline notified me that a couple of samples showed very low completeness scores (around 7 or 8), although all other QC metrics were excellent. It seems that p7_hmmfile.c ran into a "no space left on device" error for the /tmp directory, but that this was not caught by the code calling the command.
Locally, I filled up my storage space to try and replicate this error (several tests with 50-200 Mb storage left). The actual HMM error itself was not caught, but in some cases CheckM produced an output file with no exit code >0. With too little storage space, other errors happend before this particular one and with enough space this didn't happen.
I'm not familiar with the CheckM source code, so I can't really tell where this should be caught. Happy to open a PR if you have some guidance or an idea where this happens.
Hi there,
I ran into an odd bug using CheckM today, where a no space left on device error was not caught and CheckM reported wrong output with exit code 0. I think this is a bit of an edge cases as we have run CheckM on thousands of genomes the last years and this is the first time I'm seeing this, but it did catch us off guard today.
We run CheckM within a single rule of our Snakemake-managed workflow. Exit codes >0 are caught and propagated, so that the whole pipeline run is aborted (which we check to monitor technical failures). Today, a colleague using the results of the pipeline notified me that a couple of samples showed very low completeness scores (around 7 or 8), although all other QC metrics were excellent. It seems that
p7_hmmfile.c
ran into a "no space left on device" error for the/tmp
directory, but that this was not caught by the code calling the command.Locally, I filled up my storage space to try and replicate this error (several tests with 50-200 Mb storage left). The actual HMM error itself was not caught, but in some cases CheckM produced an output file with no exit code >0. With too little storage space, other errors happend before this particular one and with enough space this didn't happen.
I'm not familiar with the CheckM source code, so I can't really tell where this should be caught. Happy to open a PR if you have some guidance or an idea where this happens.
Please see here for the outptu of runs with limited storage space: 139mb_space.txt 89mb_space.txt