Open neoformit opened 1 year ago
I am getting this as well, have been unable to resolve
I experienced that for some protein sequences (zinc fingers) the raw MSA size (before truncation/filtering) can be > 100 GB. The out-of-memory probably occurs when the sequences from a HHblits/Jackhmmer job are transferred from RAM to the (temporary) file. The increase of memory probably occurs very fast (depending on the transfer rate from RAM to disk) and might be difficult to track with watch df. If the MSA size is the cause, then the RAM usage of the HHblits/Jackhmmer process should be already very large (up to 100 GB).
Related to https://github.com/deepmind/alphafold/issues/280.
We are getting a disk write error after Jackhammer completes:
After thousands of successful AF2 runs on our infrastructure, this has only occurred with the following protein input (2273AA):
According to the docker container, we have 86G of disk available at runtime for the
tmp
directory:Furthermore, I don't see any sign of disk filling up when I run
watch df -h
on the host. There is plenty of disk available on the root partition, where both/var/lib/docker
and/tmp
are located. It is possible that this could be a bug in AlphaFold. Any help would be greatly appreciated!