Closed glennhickey closed 4 months ago
Glenn Hickey @.***> writes:
The more immediate workaround is to use
--format mmap
, but I'm not sure how robust that will be (it produced a corrupt file with my small maf2hal test)
When we implemented mmap format, we maf2hal didn't really work reliable with HDF5, so it was probably never tested with mmap.
When I add
hid_t cparms_id = cparms.getId();
herr_t ret = H5Pset_attr_phase_change(cparms_id, 0, 0);
assert(ret >= 0);
above _dataSet = _file->createDataSet(_path, _dataType, _dataSpace, cparms);
in hdf5ExternalArray::create()
I get the same error as before. @jrvalverde I think your best bet for a workaround is adding --format mmap
do the command that is creating your HAL file. This bypasses hdf5 entirely in favour of a custom format. It's much faster but since it's not compressed, also much bigger. All hal tools should work natively on it and it won't be subject to this particular limitation (still can't guarantee it'll work though).
Thanks, I will try that later, I do not really worry much about space, I think I have plenty, so if it works that's great for me.
It'll just take some time, I've got to go get the COVID vaccine today, correct a paper and answer a number of student requests, so today's plenty of work. I think I'll launch the run now and see if I can monitor the results later. Yes, that'll be the best. I'll let you know how it goes. Then later I'll also have a try to the hal code as well.
Thank you so very much, your help and support are excellent!
j
On Tue, 18 May 2021 12:51:35 -0700 Glenn Hickey @.***> wrote:
When I add
hid_t cparms_id = cparms.getId(); herr_t ret = H5Pset_attr_phase_change(cparms_id, 0, 0); assert(ret >= 0);
above
_dataSet = _file->createDataSet(_path, _dataType, _dataSpace, cparms);
in hdf5ExternalArray::create()I get the same error as before. @jrvalverde I think your best bet for a workaround is adding
--format mmap
do the command that is creating your HAL file. This bypasses hdf5 entirely in favour of a custom format. It's much faster but since it's not compressed, also much bigger. All hal tools should work natively on it and it won't be subject to this particular limitation (still can't guarantee it'll work though).-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/ComparativeGenomicsToolkit/hal/issues/212#issuecomment-843508631
-- Scientific Computing Service Centro Nacional de Biotecnología, CSIC. c/Darwin, 3. 28049 Madrid +34 91 585 45 05 +34 659 978 577
On Tue, 18 May 2021 12:51:35 -0700 Glenn Hickey @.***> wrote:
I get the same error as before. @jrvalverde I think your best bet for a workaround is adding
--format mmap
do the command that is creating your HAL file. This bypasses hdf5 entirely in favour of a custom format. It's much faster but since it's not compressed, also much bigger. All hal tools should work natively on it and it won't be subject to this particular limitation (still can't guarantee it'll work though).Thank you so very much, I have tried manually to run halAppendCactusSubree with the --format mmap option and it did complete, generating the hal file; so I have modified the source code in cactus_progressive.py and in cactus_constructFromIntermediates.py to add the '--format' and 'mmap' strings to the start of the argument list and am running cactus again. For now it seems to work, I'm leaving it running and will see what happened tomorrow.
If it works, I'll send you back a full list of all the changes I made to the source code to make it run.
Crossing my fingers,
j
-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/ComparativeGenomicsToolkit/hal/issues/212#issuecomment-843508631
-- Scientific Computing Service Centro Nacional de Biotecnología, CSIC. c/Darwin, 3. 28049 Madrid +34 91 585 45 05 +34 659 978 577
Hello, I have encountered the same problem, could you please provide the script you modified? Thank you very much🥳! This is my email ahhhoh@163.com
You'd want to change this line
cactus_call(parameters=["halAppendCactusSubtree"] + args)
to
cactus_call(parameters=["halAppendCactusSubtree"] + args + ['--format', 'mmap'])
(making sure this happens in your virtualenv, or you rerun pip install -U .
after making the change). This should get around the child limit, at the cost of making a much bigger HAL file.
This is an issue (along with some other HDF5-related bottlenecks) that remains on our radar and I hope to fix in the next few months. It's a big project though.
You'd want to change this line
cactus_call(parameters=["halAppendCactusSubtree"] + args)
to
cactus_call(parameters=["halAppendCactusSubtree"] + args + ['--format', 'mmap'])
(making sure this happens in your virtualenv, or you rerun
pip install -U .
after making the change). This should get around the child limit, at the cost of making a much bigger HAL file.This is an issue (along with some other HDF5-related bottlenecks) that remains on our radar and I hope to fix in the next few months. It's a big project though.
Thank you very much! I modified the script as you said, but another error occurred:
[2021-12-08T19:40:38+0800] [MainThread] [W] [toil.fileStores.abstractFileStore] Failed job accessed files:
[2021-12-08T19:40:38+0800] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-JobFunctionWrappingJob/instance-mvvjisx5/file-d37c3cd725bb48e19fe0ad48af1c75fd/Anc0_experiment.xml' to path '/tmp/b6383c7845f0557386da6b815d4b0c5e/2f51/7e66/tmp5cw6ux__.tmp'
[2021-12-08T19:40:38+0800] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/no-job/file-f0bf8c828de04348822d21edae12c5ae/config.xml' to path '/tmp/b6383c7845f0557386da6b815d4b0c5e/2f51/7e66/tmp4bs5x19w.tmp'
[2021-12-08T19:40:38+0800] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-CactusConsolidated/instance-y685hzgx/file-b8a1aa658d564e79bf19966de73f6996/tmppad1_0gh.tmp' to path '/tmp/b6383c7845f0557386da6b815d4b0c5e/2f51/7e66/tmp192jqjq9.tmp'
[2021-12-08T19:40:38+0800] [MainThread] [W] [toil.fileStores.abstractFileStore] Downloaded file 'files/for-job/kind-CactusConsolidated/instance-y685hzgx/file-3d6b927ed7ca4532b42ae6c21fc1ac2b/tmpil1oarxx.tmp' to path '/tmp/b6383c7845f0557386da6b815d4b0c5e/2f51/7e66/tmpkvkxoaqn.tmp'
Traceback (most recent call last):
File "/public/zpmiao/software/cactus-bin-v2.0.4/venv/lib/python3.8/site-packages/toil/worker.py", line 393, in workerScript
job._runner(jobGraph=None, jobStore=jobStore, fileStore=fileStore, defer=defer)
File "/public/zpmiao/software/cactus-bin-v2.0.4/venv/lib/python3.8/site-packages/toil/job.py", line 2360, in _runner
returnValues = self._run(jobGraph=None, fileStore=fileStore)
File "/public/zpmiao/software/cactus-bin-v2.0.4/venv/lib/python3.8/site-packages/toil/job.py", line 2281, in _run
return self.run(fileStore)
File "/public/zpmiao/software/cactus-bin-v2.0.4/venv/lib/python3.8/site-packages/toil/job.py", line 2504, in run
rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs)
File "/public/zpmiao/software/cactus-bin-v2.0.4/venv/lib/python3.8/site-packages/cactus/progressive/cactus_progressive.py", line 335, in exportHal
cactus_call(parameters=["halSetMetadata", HALPath, "CACTUS_COMMIT", cactus_commit])
File "/public/zpmiao/software/cactus-bin-v2.0.4/venv/lib/python3.8/site-packages/cactus/shared/common.py", line 866, in cactus_call
raise RuntimeError("Command {} exited {}: {}".format(call, process.returncode, out))
RuntimeError: Command ['docker', 'run', '--interactive', '--net=host', '--log-driver=none', '-u', '1004:1005', '-v', '/tmp/b6383c7845f0557386da6b815d4b0c5e/2f51/7e66:/data', '--entrypoint', '/opt/cactus/wrapper.sh', '--name', '84221427-ebf1-4f92-871b-1d45928864cb', '--rm', 'quay.io/comparative-genomics-toolkit/cactus:eca7219f3943465b73f240dd86b5e8e228162144', 'halSetMetadata', 'tmp_alignment.hal', 'CACTUS_COMMIT', 'eca7219f3943465b73f240dd86b5e8e228162144'] exited 1: stdout=None, stderr=Running command catchsegv 'halSetMetadata' 'tmp_alignment.hal' 'CACTUS_COMMIT' 'eca7219f3943465b73f240dd86b5e8e228162144'
terminate called after throwing an instance of 'hal_exception'
what(): tmp_alignment.hal: file is marked as dirty, most likely an inconsistent state.
Aborted (core dumped)
[2021-12-08T19:40:38+0800] [MainThread] [E] [toil.worker] Exiting the worker because of a failed job on host localhost.localdomain
<=========
Oh, you may need the --format mmap to all subsequent hal commands in the method or, even safer, just comment them out.
--format mmap only need to be specified when the hal is created. Subsequent commands recognized the format by examining the hearer of the file.
The above exception indicates that file was not successfully closed during the last write operation.
This is because we never put in any kind of locking in hal mmap, assuming only one accessor for writing
Oh, you may need the --format mmap to all subsequent hal commands in the method or, even safer, just comment them out.
Thank you very much for your help! it worked! But when I use hal2maf --format mmap
to convert it to a maf file I get the same error, it seems that I can't convert the hal
file to other formats. Is there another way to convert this hal
file? Actually, what I really need is a maf
or fasta
file of the multiple whole genome alignment.
hal exception caught: out.hal: file is marked as dirty, most likely an inconsistent state.
--format mmap only need to be specified when the hal is created. Subsequent commands recognized the format by examining the hearer of the file.
The above exception indicates that file was not successfully closed during the last write operation.
As you said, I only specified the --format mmap
when the hal file was created (I commented out the subsequent hal commands), and I successfully got an output hal
file, but when I converted it to other formats, I got the same error. How should I solve it? I would very appreciate it if you can help me!😄😄
The error means that the HAL file was being written and that it didn't complete successfully. Since the mmap format is not transactional, there is no way to recover from it. I don't believe the HDF5 can be recovered either if a write operation doesn't complete.
Thank you, the last question😂. I would like to know if there is an intermediate alignment file in cactus-align
before it is converted into a hal
file, such as fasta
or maf
?
When calling this is in hdf5Genome.cpp
The compound datatype returned by Hdf5BottomSegment::dataType grows with the number of children. Apparently the size of the dataype cannot exceed 64kb, which in practice seems to cap the number of children at about 545.
This has come up as an issue because someone was crazy enough to try the Cactus Pangenome Pipeline, which uses a star tree, on 1000s of bacteria genomes. As they mention, there seems to be hope for a workaround in the form of
H5Pset_attr_phase_change()
.There is some documentation on it (and another possible workaroudn) here. It seems simple enough to be worth a try, since the number of children is (I think) always known a priori, the toggle can only be used when absolutely necessary, preserving backwards compatibility.
The more immediate workaround is to use
--format mmap
, but I'm not sure how robust that will be (it produced a corrupt file with my small maf2hal test)