MRtrix3 / mrtrix3

MRtrix3 provides a set of tools to perform various advanced diffusion MRI analyses, including constrained spherical deconvolution (CSD), probabilistic tractography, track-density imaging, and apparent fibre density
http://www.mrtrix.org
Mozilla Public License 2.0
291 stars 179 forks source link

Improve handling of external dependencies #2859

Closed daljit46 closed 1 month ago

daljit46 commented 6 months ago

Aims to address #2584. Eigen and Json for Modern C++ are now fetched using CMake's FetchContent. Nifti headers are moved to a new thirdparty/nifti directory.

Seems to work fine locally, but there is a small downside: the configure log is quite a bit more verbose than before because of Eigen (at least the first it is downloaded). Not sure if there is a way to improve on that.

daljit46 commented 6 months ago

On the configuration issue, unfortunately Eigen 3.4.0 doesn't make use of CMake's best practices so it's wrongly assuming that it's being built as a top level project. This has been fixed in a subsequent commit. Perphaps we could add a small patch based on that commit to make this work, but not sure.

Furthermore, CI tests seem to be failing probably because we are using CMake 3.16, but I'm hoping that we can tweak things to build succesfully (hopefully without hitting a dealbreaker).

github-actions[bot] commented 6 months ago

clang-tidy review says "All clean, LGTM! :+1:"

daljit46 commented 6 months ago

Ok, so I've patched eigen removing unncessary configuration messages. This seems to work as expected. I have one extra thought that I'd like to get some feedback on. By default FetchContent downloads dependencies in the build directory. It's quite common for developers to start a clean build. There are a few ways to achieve this with CMake:

cmake -B build --fresh # It reconfigures the project deleting any CMakeCache.txt
cmake --build build --target clean # cleans any build artifacts
cmake --build build --clean-first # Same as previous line but triggers a build

From our discussions, it seems that a rm -rf build seems to be preferred amongst us to the options above(?). However, if we delete the build direcory then FetchContent will be forced to refetch the dependencies from the internet, elongating the configuration time. One way to get around this is to direct FetchContent to download the dependencies to a separate directory (e.g. _dependencies) and add this to .gitignore. This would allow devs to delete the build directory without having to download the dependencies again.

@MRtrix3/mrtrix3-devs any thoughts?

github-actions[bot] commented 6 months ago

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] commented 6 months ago

clang-tidy review says "All clean, LGTM! :+1:"

daljit46 commented 6 months ago

Another relevant issue is that if we decide to pull third party source code from a server, our Github tarballs no longer will contain the full source code required to compile once downloaded, but only after the CMake configuration phase. This can be an issue for certain usecases like packaging workflows by Linux distros.

To avoid this, @jdtournier suggested that we should instead look into git subtrees, which would effectively integrate thirdparty dependencies into our source code (the third party code will effectively be part of our repo instead of being just a link like in the case of git submodules). This gets around the issue just mentioned above, however there are some things that I think could be problematic if we use this approach. Most importantly, because git subtree incorporates third party code into our repo (including their histories), the use of git subtrees will inevitably result in a significant increase of our repo's size. For example, the repo for eigen weighs almost ~100MB when compressed (uncompressed size is about ~130MB). nlohmann::json's repo also exceeds 100MB. This seems a little excessive to me since all we really need is just a header file. Of course, FetchContent will require the download of third party repos too, but it would only do so at configure time thus effectively having no effect on the size of the repo.

@MRtrix3/mrtrix3-devs I'm not really sure how to proceed here, but let me know if anyone has any thoughts.

Lestropie commented 6 months ago

The other thought that came up during last meeting (wrote in minutes, didn't get to here yet) was to give cmake the ability to configure external dependencies for the build process in one of two ways:

  1. If a relevant environment variable is specified at configure time, providing a filesystem path to a location where that external dependency can be found, then it will pull the source code from that external location into the build directory. This would allow offline compilation, provided that the mechanism used to obtain source code prior to attempting offline compilation performs the requisite acquisition of the source code for those external dependencies. This could be eg.:

    1. Downloading and embedding the specified versions of those external dependencies within our own GitHub tarballs;
    2. Someone like a distro maintainer expressing in their MRtrix3 package the fact that additional dependencies need to be pulled and the MRtrix3 cmake needs to be instructed where to find them.
  2. If that environment variable is not specified, then something like FetchContent will instead be used.

daljit46 commented 2 months ago

I have now rebased this and made some changes to address the issues above:

@MRtrix3/mrtrix3-devs Feedback welcome.

github-actions[bot] commented 2 months ago

clang-tidy review says "All clean, LGTM! :+1:"

daljit46 commented 2 months ago

Made some more changes. Now we only download release artefacts for Eigen instead of cloning the Gitlab repo (which is much faster). Additionally, I got rid of the custom patch to silence Eigen's configuration output and instead create a CMake INTERFACE library manually. Finally, the base location of FetchContent has been globally set to PROJECT_SOURCE_DIR/_fetchcontent.

github-actions[bot] commented 2 months ago

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] commented 2 months ago

clang-tidy review says "All clean, LGTM! :+1:"

daljit46 commented 2 months ago

As we discussed in the meeting yesterday, I have retracted the change to avoid pull in dependencies in a custom source folder inside the source directory. So dependencies will be downloaded in the build directory. This shouldn't be an issue because download times should be quite short. @jdtournier @Lestropie unless you have any objections, I will enable auto-merge on this PR.

github-actions[bot] commented 2 months ago

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] commented 1 month ago

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] commented 1 month ago

clang-tidy review says "All clean, LGTM! :+1:"

daljit46 commented 1 month ago

@MRtrix3/mrtrix3-devs unless anyone has objections, I will now merge this.

Lestropie commented 1 month ago
Lestropie commented 1 month ago

Another factor that comes to mind here is whether nifti1.h and nifti2.h should be obtained from some other source. If there isn't a robust online source for those files, we could create a new repository under our organisation that just stores those two files so that they can be pulled. This would (I think) prevent the current situation where the thirdparty/ directory contains a combination of content from the main repo and pulled content. It's not a clear beneficial alternative so might not be worth the effort, but thought I'd state nevertheless.

daljit46 commented 1 month ago

It would have been preferable for https://github.com/MRtrix3/mrtrix3/commit/df08ea1e786c324063d826457959e5138a476582 to be split into two commits, with removal of the third-party files authored by @MRtrixBot, for the sake of contribution statistics tracking.

I'm not sure what the permissions are on dev, but if we're allowed to rebase then this should be fairly straightforward to do I think. The only file that was deleted was json.h. Better do it ASAP though, before other PRs merge into dev.

we could create a new repository under our organisation that just stores those two files so that they can be pulled.

Makes sense to me.

daljit46 commented 1 month ago

@Lestropie @jdtournier I have created a new branch fix_pr2859 that splits the commit for removing core/file/json.h and attributes authorship to MRtrixBot. I don't know how we want to proceed, but we could do something like git checkout dev && git reset origin/fix_pr2859 --hard && git push --force. This would rewrite the history up from the last commit before this branch's commits.

Lestropie commented 1 month ago

Thanks for investing the energy. I did a visual comparison between #2859 and a hypothetical merge of fix_pr2859 to dev. Unfortunately f4b37e04f1b32bbdd501910fbecc0ad2b0de1d6b has the same problem in that it removes half.hpp. If you're able to refactor that one as well, then I can re-check, and then temporarily release the restrictions on dev and do a force push.

daljit46 commented 1 month ago

Oh yes I missed that. I rebased again and now put the deletion of core/file/json.h and core/half.hpp into a single commit at the top.

Lestropie commented 1 month ago

Thanks. Force-pushed and branch protections re-established. I also cherry-picked ee8def69aea629d23e3aca5dba5e758f1ff5d7db from #2974 as d5916580358c289dea0e6ae9af0502993f20dd85 before pushing.