Open liangminliu opened 3 weeks ago
You can use cactus, which includes hdf5, as a model for making hdf5.
See the dockerfile for an example installing hdf5 from apt
on ubuntu:
https://github.com/ComparativeGenomicsToolkit/cactus/blob/master/Dockerfile#L3-L4
See this script for installing everything from source :
https://github.com/ComparativeGenomicsToolkit/cactus/blob/master/build-tools/makeBinRelease#L59-L66
Thank you for your suggestions and support. I encountered errors while using Cactus. Initially, I installed Cactus and successfully ran the tests to verify its functionality. Here is the test command I used:
cactus ./js ./examples/evolverMammals.txt ./evolverMammals.hal
However, when I attempted to run my own dataset on the LSF system with the following command:
bsub -n 40 \
-R "span[hosts=1]" \
-M 700G \
-o cactus_%J_output.log \
-e cactus_%J_error.log \
cactus ${JOB_STORE} ${INPUT_FILE} ${OUTPUT_FILE} \
--root "Spic" \
--logLevel INFO \
--workDir ${TEMP_DIR} \
--batchSystem lsf \
--maxCores 40 \
--defaultMemory 500G \
--defaultDisk 300G \
--retryCount 5 \
--statePollingWait 60 \
--statePollingTimeout 300 \
--clean onSuccess
I encountered an error related to the Toil batch system. The error log showed the following traceback:
Traceback (most recent call last):
...
[2024-10-23T17:54:35+0800] [MainThread] [D] [toil.deferred] Removing own state file /tmp/toilwf-97d11ed5481256a9bb166bf6ac5c8656/deferred/funckd2hdn12
[2024-10-23T17:54:35+0800] [MainThread] [D] [toil.batchSystems.abstractBatchSystem] Deleting workflow directory /tmp/toilwf-97d11ed5481256a9bb166bf6ac5c8656
[2024-10-23T17:54:35+0800] [MainThread] [D] [toil.common] ... finished shutting down the batch system in 0.2768988609313965 seconds.
Traceback (most recent call last):
File "/ds3200_1/users_root/liuliangmin/biosoft/cactus/venv-cactus-v2.9.2/lib/python3.11/site-packages/toil/batchSystems/abstractGridEngineBatchSystem.py", line 279, in run
while self._runStep():
^^^^^^^^^^^^^^^
...
toil.batchSystems.abstractGridEngineBatchSystem.AbstractGridEngineThreadException: Unexpected GridEngineThread failure
...
This error appears to be related to Toil. Given that I am using the LSF system, I ultimately decided to split my data into three groups and use LAST to align them into roast.maf files due to the large number of genomic sequences. My plan is to convert these roast.maf files to HAL format using maf2hal
, and then merge them with halAppendSubtree
.
If you have any alternative methods or suggestions for resolving the Cactus runtime issues, or perhaps a more efficient way to merge roast.maf files, I would greatly appreciate your insights.
Since I already have Cactus installed in the venv-cactus-v2.9.2
environment, should I proceed to install HAL directly in this environment as well?
I appreciate your assistance in resolving this matter!
The last time I tried maf2hal, it did not work, although I don't remember the failure. This maybe a very problematic path.
Please post this to the toil list
https://github.com/DataBiosphere/toil/issues
liangminliu @.***> writes:
Thank you for your suggestions and support. I encountered errors while using Cactus. Initially, I installed Cactus and successfully ran the tests to verify its functionality. Here is the test command I used:
cactus ./js ./examples/evolverMammals.txt ./evolverMammals.hal
However, when I attempted to run my own dataset on the LSF system with the following command:
bsub -n 40 \ -R "span[hosts=1]" \ -M 700G \ -o cactus_%J_output.log \ -e cactus_%J_error.log \ cactus ${JOB_STORE} ${INPUT_FILE} ${OUTPUT_FILE} \ --root "Spic" \ --logLevel INFO \ --workDir ${TEMP_DIR} \ --batchSystem lsf \ --maxCores 40 \ --defaultMemory 500G \ --defaultDisk 300G \ --retryCount 5 \ --statePollingWait 60 \ --statePollingTimeout 300 \ --clean onSuccess
I encountered an error related to the Toil batch system. The error log showed the following traceback:
Traceback (most recent call last): ... [2024-10-23T17:54:35+0800] [MainThread] [D] [toil.deferred] Removing own state file /tmp/toilwf-97d11ed5481256a9bb166bf6ac5c8656/deferred/funckd2hdn12 [2024-10-23T17:54:35+0800] [MainThread] [D] [toil.batchSystems.abstractBatchSystem] Deleting workflow directory /tmp/toilwf-97d11ed5481256a9bb166bf6ac5c8656 [2024-10-23T17:54:35+0800] [MainThread] [D] [toil.common] ... finished shutting down the batch system in 0.2768988609313965 seconds. Traceback (most recent call last): File "/ds3200_1/users_root/liuliangmin/biosoft/cactus/venv-cactus-v2.9.2/lib/python3.11/site-packages/toil/batchSystems/abstractGridEngineBatchSystem.py", line 279, in run while self._runStep(): ^^^^^^^^^^^^^^^ ... toil.batchSystems.abstractGridEngineBatchSystem.AbstractGridEngineThreadException: Unexpected GridEngineThread failure ...
This error appears to be related to Toil. Given that I am using the LSF system, I ultimately decided to split my data into three groups and use LAST to align them into roast.maf files due to the large number of genomic sequences. My plan is to convert these roast.maf files to HAL format using
maf2hal
, and then merge them withhalAppendSubtree
.If you have any alternative methods or suggestions for resolving the Cactus runtime issues, or perhaps a more efficient way to merge roast.maf files, I would greatly appreciate your insights.
Since I already have Cactus installed in the
venv-cactus-v2.9.2
environment, should I proceed to install HAL directly in this environment as well?I appreciate your assistance in resolving this matter!
-- Reply to this email directly or view it on GitHub: https://github.com/ComparativeGenomicsToolkit/hal/issues/309#issuecomment-2449030727 You are receiving this because you are subscribed to this thread.
Message ID: @.***>
This issue seems to be going in all sorts of directions. To resume
last->maf->maf2hal->halAppendSubtree
is a viable pipeline and advise against spending time trying to get it to work. Thank you all for your suggestions and support!
HAL Installation: I faced an error with hadf5
during the make install
process (https://github.com/ComparativeGenomicsToolkit/cactus/blob/master/build-tools/makeBinRelease#L59-L66), but I managed to resolve it using singularity exec cactus_v2.9.2.sif maf2hal
, which worked well.
Toil Issues with LSF: After some research, I found the LSF issues quite complex (https://github.com/DataBiosphere/toil/issues). So I switched to using the SLURM system, which has allowed me to run Cactus successfully. However, I'm looking for ways to improve the speed of my runs.
Merging Roast.MAF Files: I understand there may be challenges regarding the LAST to MAF to HAL pipeline. Do you have alternative methods or suggestions for efficiently merging roast.maf files? I would greatly appreciate your insights.
Thank you again for your assistance!
Hi,
I am experiencing an issue while building HAL according to the instructions provided in the README. Initially, I used HDF5 version 1.14.5, but I encountered errors. Therefore, I switched to HDF5 version 1.10.1, as specified in the README. I have also successfully installed and configured the necessary dependencies, including SonLib, CLAPACK, and PhyloP. However, I am encountering an error during the
make
step after settingexport ENABLE_PHYLOP=1
.Error Log:
Environment:
Steps to Reproduce:
export ENABLE_PHYLOP=1
.make
.But the build fails with the errors related to missing methods (
openGroup
andcreateGroup
) inH5::PortableH5Location
.It appears that the methods
openGroup
andcreateGroup
are not recognized as members ofH5::H5Location
. As specified in the environment setup, I am using HDF5 version 1.10.1.Question:
Any help would be greatly appreciated.