chanzuckerberg / shasta

[MOVED] Moved to paoloshasta/shasta. De novo assembly from Oxford Nanopore reads
Other
270 stars 57 forks source link

Do not hardcode BLAS and Lapack libraries. Find them! #291

Closed ccoulombe closed 1 year ago

ccoulombe commented 2 years ago

Blas and Lapack can have different provider (OpenBLAS, MKL, Flexiblas). Do not hardcode the libraries name but rather find them. Same for libpng, find it.

Also uses python3-config from PATH and not hardcoded path.

paoloczi commented 2 years ago

Thank you for this.

Regarding using cmake to find the BLAS and Lapack libraries: @ocaisa already made the same suggestion in the past, but I discovered that, because of a known bug in cmake, it only works for the dynamic executable, and not for the static executable. See #268 for some discussion on that.

Your PR proposes this change only for the dynamic executable, so it has no issues. But for ease in maintenance I hesitate to use two different methods to specify the libraries, one for the static executable and one for the dynamic executable, and I would like to hear your thoughts about that. Especially because the dynamic executable is not built by default, is not included in Shasta releases, and is rarely useful because it is less portable (it only works on the same Ubuntu version on which it was built, and cannot be used on other Linux distributions).

The ideal solution would be to get cmake to find the libraries in the same way for the static executable, but I was not able to do this in the past when I tried.

I agree with not hardcoding the path to python3-config.

Also, keep in mind Shasta will very soon only build on Ubuntu 22.04, because I am starting to use C++20 features not available before gcc 11. This should have no effect on your PR, however.

ccoulombe commented 2 years ago

I do not know the reasons why to build the static by default (I suspect distribution?), I mainly see the inverse, dynamic by default and static activated on demand, when I build software. But static and dynamic linking are two different kind of build. I'd suggest enabling by default the dynamic. This does not hinder distribution but allow packages managers and systems administrators to build the dynamic library as when you'd want a static build, you can activate it.

1) The FindBLAS module has a BLA_STATIC option : https://cmake.org/cmake/help/latest/module/FindBLAS.html You should also be able to use FindPNG module for static vs dynamic, or rely on find_library with the hardcoded library name: https://cmake.org/cmake/help/latest/module/FindPNG.html

2) Since you already have two kind of CMakeLists.txt, they can diverge a little since they serve different purpose :

The ideal solution would be to get cmake to find the libraries in the same way for the static executable, but I was not able to do this in the past when I tried.

Not all vendor/installation have static blas/lapack. MKL has and it works with cmake 3.20 and up, but my installation of OpenBLAS does not.

-- Looking for sgemm_
-- Looking for sgemm_ - found
-- Found BLAS: -Wl,--start-group;/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/imkl/2020.1.217/mkl/lib/intel64_lin/libmkl_intel_lp64.a;/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/imkl/2020.1.217/mkl/lib/intel64_lin/libmkl_intel_thread.a;/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/imkl/2020.1.217/mkl/lib/intel64_lin/libmkl_core.a;-Wl,--end-group;/cvmfs/restricted.computecanada.ca/easybuild/software/2020/Core/intel/2020.1.217/compilers_and_libraries_2020.1.217/linux/compiler/lib/intel64_lin/libiomp5.a;-lpthread;-lm;-ldl  
-- Looking for cheev_
-- Looking for cheev_ - found
-- Found LAPACK: -Wl,--start-group;/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/imkl/2020.1.217/mkl/lib/intel64_lin/libmkl_intel_lp64.a;/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/imkl/2020.1.217/mkl/lib/intel64_lin/libmkl_intel_thread.a;/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/imkl/2020.1.217/mkl/lib/intel64_lin/libmkl_core.a;-Wl,--end-group;/cvmfs/restricted.computecanada.ca/easybuild/software/2020/Core/intel/2020.1.217/compilers_and_libraries_2020.1.217/linux/compiler/lib/intel64_lin/libiomp5.a;-lpthread;-lm;-ldl;-lpthread;-lm;-ldl  
-- Configuring done

Both these options allow flexibility on other systems, that will prefer to build the dynamic library (esp. when building from source).

I could refactor a little bit more this PR if you'd consider option 1.

paoloczi commented 2 years ago

Thank you for your thoughts. I plan to keep the static executable as the main release artifact, because of the ease with which it can be used on Linux distributions other than Ubuntu - it does not require installation because it contains all of its dependencies. The dynamic executable certainly has advantages, but it requires an implicit assumption that everybody is on Ubuntu - and the same version on which Shasta was built. Most Shasta users don't build the code themselves - instead, they download the static executable from a Shasta release. And people who desire to build the dynamic executable for whatever reason can easily do so via -DBUILD_DYNAMIC_EXECUTABLE=ON when running cmake.

I would be interested in option 1, but at the time I tried BLA_STATIC and I was not able to get it to work. I found several mentions of this problem on the Internet, including this one which describes a known cmake bug. This was some time ago, and it is possible that this problem was fixed in the meantime. So if you find a way to get it to work in the same way for the static and dynamic executables, I would certainly want to switch to that.

Option 2 complicates maintenance a bit, and so I would prefer to avoid it.

Also, because of the various dependencies, I don't attempt to keep the Shasta build portable. I only make sure that it runs on the latest Ubuntu LTS release, currently 22.04. By the way, I just made this morning some changes that break the build on Ubuntu 20.04, and Ubuntu 22.04 is the only build platform I currently support. On the other hand, the static executable runs on all current 64 bit Linux distributions without requiring any installation of dependencies. This has value for usability.

I do realize that not all BLAS/Lapack implementation come with a static version. Fortunately, the Ubuntu packages to include the static libraries, and so there is no problem building on an Ubuntu system with the system-provided BLAS/Lapack. This could imply some performance penalty, but these libraries are not used in any performance critical portion of Shasta code.

ccoulombe commented 2 years ago

This issue is very old. Using a recent CMake works per my tests (when static libs are available). I'll refactor a little bit the PR to include the static part.

Building from source should be the way forward as it provide an optimized build. Even if you prefer to support Ubuntu, everyone should/will be able to build from source with the correct requisites.

paoloczi commented 2 years ago

Thank you @ccoulombe ! If you can change your PR so that it works both for the static and dynamic executable, I will be happy to merge it. Please check that it works on Ubuntu 22.04 if you can, or otherwise I can do that part by cloning your repository. The latest code on GitHub no longer builds on Ubuntu 20.04 because I started to use new C++20 functionality.

I think it will be hard, but perhaps not impossible, to allow building Shasta outside of Ubuntu, due to some of the prerequisites. If you are able to provide a "universal" version of shasta/scripts/InstallPrerequisites-Ubuntu.sh that works on more than just Ubuntu, I would be happy to include it.

For all practical purposes, for a long-running executable such as Shasta the static executable provides equivalent optimization to the dynamic executable. Even though the static executable can be quite a bit larger, load time is entirely negligible for a typical assembly.

ccoulombe commented 2 years ago

Done, now the static executable find a static BLAS if available.

paoloczi commented 2 years ago

Nice, well done, thank you!

Before merging it, we need to test that it works on Ubuntu 22.04 and with the latest code on GitHub, both on Intel and ARM hardware. Please let me know if you had a chance to do this - otherwise I am happy to do that myself, and I should be able to do it within a couple of days.

paoloczi commented 2 years ago

To simplify my testing, it would be great if you could pull into your fork the latest code from chanzuckerberg/shasta. You fork is currently 3 commits behind, and between last Friday and today I made some changes that could potentially interact with your work.

paoloczi commented 2 years ago

Unfortunately my latest commit introduced the need for manual merging, but it should be easy enough. I just removed the cpu_features library from the command line, which is no longer needed because of changes in the spoa library.

paoloczi commented 2 years ago

This fork is behind and merging this PR would require a manual merge which I want to avoid, so I tried making the same changes on the latest code. Things worked fine for the dynamic (shared) library and executable, but for not the static ones. The following error occurred when invoking cmake:

CMake Error at /usr/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find LAPACK (missing: LAPACK_LIBRARIES)
Call Stack (most recent call first):
  /usr/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
  /usr/share/cmake-3.22/Modules/FindLAPACK.cmake:744 (find_package_handle_standard_args)
  staticExecutable/CMakeLists.txt:51 (find_package)

This is similar to what I got some time ago when I tried making a similar change - somehow the BLA_STATIC option is still not working. I am on Ubuntu 22.04 with the default Blas and Lapack libraries installed per shasta/scripts/InstallPrerequisites-Ubuntu.sh. This is the only platform I currently support for building Shasta, and so I will need to give up on this unless you are able to make suggestions.

You said these changes work for you. Can you describe the platform you tested them on (Linux distribution and Blas/Lapack versions)?

paoloczi commented 1 year ago

Since no solution has emerged for the static libraries, I am closing this PR.