ComparativeGenomicsToolkit / hal

Hierarchical Alignment Format
Other
165 stars 39 forks source link

Build Error with HAL Using HDF5 #309

Open liangminliu opened 3 weeks ago

liangminliu commented 3 weeks ago

Hi,

I am experiencing an issue while building HAL according to the instructions provided in the README. Initially, I used HDF5 version 1.14.5, but I encountered errors. Therefore, I switched to HDF5 version 1.10.1, as specified in the README. I have also successfully installed and configured the necessary dependencies, including SonLib, CLAPACK, and PhyloP. However, I am encountering an error during the make step after setting export ENABLE_PHYLOP=1.

Error Log:

h5c++ -prefix=~/biosoft/hdf5-hdf5_1.10.1 -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -O3 -g -Wall -funroll-loops -DNDEBUG -I/ds3200_1/users_root/liuliangmin/biosoft/sonLib/lib -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++11 -Wno-sign-compare -I../api/inc -Iimpl -Iinc -I../liftover/inc -Ihdf5_impl -Immap_impl -c hdf5_impl/hdf5Genome.cpp -o ../objs/api/hdf5_impl/hdf5Genome.o
hdf5_impl/hdf5Genome.cpp: In constructor ‘hal::Hdf5Genome::Hdf5Genome(const string&, hal::Hdf5Alignment*, H5::PortableH5Location*, const H5::DSetCreatPropList&, bool)’:
hdf5_impl/hdf5Genome.cpp:49:28: error: ‘H5::PortableH5Location’ {aka ‘class H5::H5Location’} has no member named ‘openGroup’
         _group = h5Parent->openGroup(name);
                            ^~~~~~~~~
hdf5_impl/hdf5Genome.cpp:51:28: error: ‘H5::PortableH5Location’ {aka ‘class H5::H5Location’} has no member named ‘createGroup’; did you mean ‘createAttribute’?
         _group = h5Parent->createGroup(name);
                            ^~~~~~~~~~~
                            createAttribute
make[1]: *** [../rules.mk:19: ../objs/api/hdf5_impl/hdf5Genome.o] Error 1
make[1]: Leaving directory '/ds3200_1/users_root/liuliangmin/biosoft/hal/api'
make: *** [Makefile:13: api.libs] Error 2

Environment:

Steps to Reproduce:

  1. Install the required dependencies (HDF5 1.10.1, SonLib, CLAPACK, PhyloP).
  2. Set up the environment according to the README.
  3. Set export ENABLE_PHYLOP=1.
  4. Run make.

But the build fails with the errors related to missing methods (openGroup and createGroup) in H5::PortableH5Location.

It appears that the methods openGroup and createGroup are not recognized as members of H5::H5Location. As specified in the environment setup, I am using HDF5 version 1.10.1.

Question:

Any help would be greatly appreciated.

glennhickey commented 3 weeks ago

You can use cactus, which includes hdf5, as a model for making hdf5.

See the dockerfile for an example installing hdf5 from apt on ubuntu:

https://github.com/ComparativeGenomicsToolkit/cactus/blob/master/Dockerfile#L3-L4

See this script for installing everything from source :

https://github.com/ComparativeGenomicsToolkit/cactus/blob/master/build-tools/makeBinRelease#L59-L66

liangminliu commented 3 weeks ago

Thank you for your suggestions and support. I encountered errors while using Cactus. Initially, I installed Cactus and successfully ran the tests to verify its functionality. Here is the test command I used:

cactus ./js ./examples/evolverMammals.txt ./evolverMammals.hal

However, when I attempted to run my own dataset on the LSF system with the following command:

bsub -n 40 \
     -R "span[hosts=1]" \
     -M 700G \
     -o cactus_%J_output.log \
     -e cactus_%J_error.log \
     cactus ${JOB_STORE} ${INPUT_FILE} ${OUTPUT_FILE} \
     --root "Spic" \
     --logLevel INFO \
     --workDir ${TEMP_DIR} \
     --batchSystem lsf \
     --maxCores 40 \
     --defaultMemory 500G \
     --defaultDisk 300G \
     --retryCount 5 \
     --statePollingWait 60 \
     --statePollingTimeout 300 \
     --clean onSuccess

I encountered an error related to the Toil batch system. The error log showed the following traceback:

Traceback (most recent call last):
  ...
[2024-10-23T17:54:35+0800] [MainThread] [D] [toil.deferred] Removing own state file /tmp/toilwf-97d11ed5481256a9bb166bf6ac5c8656/deferred/funckd2hdn12
[2024-10-23T17:54:35+0800] [MainThread] [D] [toil.batchSystems.abstractBatchSystem] Deleting workflow directory /tmp/toilwf-97d11ed5481256a9bb166bf6ac5c8656
[2024-10-23T17:54:35+0800] [MainThread] [D] [toil.common] ... finished shutting down the batch system in 0.2768988609313965 seconds.
Traceback (most recent call last):
  File "/ds3200_1/users_root/liuliangmin/biosoft/cactus/venv-cactus-v2.9.2/lib/python3.11/site-packages/toil/batchSystems/abstractGridEngineBatchSystem.py", line 279, in run
    while self._runStep():
          ^^^^^^^^^^^^^^^
  ...
toil.batchSystems.abstractGridEngineBatchSystem.AbstractGridEngineThreadException: Unexpected GridEngineThread failure
  ...

This error appears to be related to Toil. Given that I am using the LSF system, I ultimately decided to split my data into three groups and use LAST to align them into roast.maf files due to the large number of genomic sequences. My plan is to convert these roast.maf files to HAL format using maf2hal, and then merge them with halAppendSubtree.

If you have any alternative methods or suggestions for resolving the Cactus runtime issues, or perhaps a more efficient way to merge roast.maf files, I would greatly appreciate your insights.

Since I already have Cactus installed in the venv-cactus-v2.9.2 environment, should I proceed to install HAL directly in this environment as well?

I appreciate your assistance in resolving this matter!

diekhans commented 3 weeks ago

The last time I tried maf2hal, it did not work, although I don't remember the failure. This maybe a very problematic path.

Please post this to the toil list

https://github.com/DataBiosphere/toil/issues

liangminliu @.***> writes:

Thank you for your suggestions and support. I encountered errors while using Cactus. Initially, I installed Cactus and successfully ran the tests to verify its functionality. Here is the test command I used:

cactus ./js ./examples/evolverMammals.txt ./evolverMammals.hal

However, when I attempted to run my own dataset on the LSF system with the following command:

bsub -n 40 \
     -R "span[hosts=1]" \
     -M 700G \
     -o cactus_%J_output.log \
     -e cactus_%J_error.log \
     cactus ${JOB_STORE} ${INPUT_FILE} ${OUTPUT_FILE} \
     --root "Spic" \
     --logLevel INFO \
     --workDir ${TEMP_DIR} \
     --batchSystem lsf \
     --maxCores 40 \
     --defaultMemory 500G \
     --defaultDisk 300G \
     --retryCount 5 \
     --statePollingWait 60 \
     --statePollingTimeout 300 \
     --clean onSuccess

I encountered an error related to the Toil batch system. The error log showed the following traceback:

Traceback (most recent call last):
  ...
[2024-10-23T17:54:35+0800] [MainThread] [D] [toil.deferred] Removing own state file /tmp/toilwf-97d11ed5481256a9bb166bf6ac5c8656/deferred/funckd2hdn12
[2024-10-23T17:54:35+0800] [MainThread] [D] [toil.batchSystems.abstractBatchSystem] Deleting workflow directory /tmp/toilwf-97d11ed5481256a9bb166bf6ac5c8656
[2024-10-23T17:54:35+0800] [MainThread] [D] [toil.common] ... finished shutting down the batch system in 0.2768988609313965 seconds.
Traceback (most recent call last):
  File "/ds3200_1/users_root/liuliangmin/biosoft/cactus/venv-cactus-v2.9.2/lib/python3.11/site-packages/toil/batchSystems/abstractGridEngineBatchSystem.py", line 279, in run
    while self._runStep():
          ^^^^^^^^^^^^^^^
  ...
toil.batchSystems.abstractGridEngineBatchSystem.AbstractGridEngineThreadException: Unexpected GridEngineThread failure
  ...

This error appears to be related to Toil. Given that I am using the LSF system, I ultimately decided to split my data into three groups and use LAST to align them into roast.maf files due to the large number of genomic sequences. My plan is to convert these roast.maf files to HAL format using maf2hal, and then merge them with halAppendSubtree.

If you have any alternative methods or suggestions for resolving the Cactus runtime issues, or perhaps a more efficient way to merge roast.maf files, I would greatly appreciate your insights.

Since I already have Cactus installed in the venv-cactus-v2.9.2 environment, should I proceed to install HAL directly in this environment as well?

I appreciate your assistance in resolving this matter!

-- Reply to this email directly or view it on GitHub: https://github.com/ComparativeGenomicsToolkit/hal/issues/309#issuecomment-2449030727 You are receiving this because you are subscribed to this thread.

Message ID: @.***>

glennhickey commented 3 weeks ago

This issue seems to be going in all sorts of directions. To resume

liangminliu commented 3 weeks ago

Thank you all for your suggestions and support!

  1. HAL Installation: I faced an error with hadf5 during the make install process (https://github.com/ComparativeGenomicsToolkit/cactus/blob/master/build-tools/makeBinRelease#L59-L66), but I managed to resolve it using singularity exec cactus_v2.9.2.sif maf2hal, which worked well.

  2. Toil Issues with LSF: After some research, I found the LSF issues quite complex (https://github.com/DataBiosphere/toil/issues). So I switched to using the SLURM system, which has allowed me to run Cactus successfully. However, I'm looking for ways to improve the speed of my runs.

  3. Merging Roast.MAF Files: I understand there may be challenges regarding the LAST to MAF to HAL pipeline. Do you have alternative methods or suggestions for efficiently merging roast.maf files? I would greatly appreciate your insights.

Thank you again for your assistance!