Closed cgirardot closed 3 years ago
Hi,
OSError: Unable to open file (unable to lock file, errno = 11, error message = 'Resource temporarily unavailable')
That one says the file is already opened by some other process and is therefore locked.
The second error is one from the cooler library stating one crucial data entry is not present. What could have happened: You have for reasons now a corrupted file. Having the first crash and then the second, I think have seen this before. The only solution is to throw away the file and use a non-corrupted copy.
The question arises, why (or better how?) do you access the same file in parallel? To open any hdf in parallel might cause these issues and can lead to corrupted files. To be clear here: The problem is not you process six matrices in parallel, but that for unknown reasons at least two of them are identical and opened by two parallel processes.
Best,
Joachim
Thank you for your answer. It can indeed happen in a Galaxy workflow that 2 jobs using the same input would be launched in parallel. This never caused an issue when using h5 format but does with cool. This suggests to me that h5 supports concurrent access while cool needs a lock. Is this what you are suggesting? Did I get this right?
Both, h5 and cool, are hdf5 based files. It surprises me a bit that only cool files are causing issues and h5 not. If you want to have parallel access you should implement a lock for both file types to be sure.
Hi @joachimwolff Using the lastest 3.6 version, I tried to change my WF (in galaxy) to now output
.cool
instead of.h5
files (at the hicBuildMatrix step); reasoning this will save space and be handier. When I run my updated WF, I had many crashes most withhicCorrectMatrix
(diag & KR norm) with weird errors. I switched back to.h5
and this fixed all issues so I kinda ignored it... but now I having the same issues with cooler files again so there might be something wrong here.The first error I have is
I first assumed it was an NFS issue but.. it is too frequent to be the right explanation and this never happens when I use h5.
If I re-run the same job, the second error is:
I should also mention that I am processing 6 matrices in parallel and 3 go through without issue. They were all produced from raw h5 matrices further
hicNormalized
as described in #704 . Since the failure happens with some of the matrices only, I think my code is OK. Also the matrices look OK in higlass.Any idea what could be wrong?