SciCompMod / memilio

Modular spatio-temporal models for epidemic and pandemic simulations
https://scicompmod.github.io/memilio/
Apache License 2.0
51 stars 15 forks source link

Provide minimal boost version with limited functionality #992

Closed lenaploetzke closed 3 weeks ago

lenaploetzke commented 3 months ago

Feature description

Provide a minimal boost version. Functionality that requires more boost parts than provided should not be built. This currently affects some functions in the file epidemiology/state_age_function.h.

Discussion: Where should the minimal boost version be stored? Ideally not as .tar.gz in the repository as before.

Additional context

No response

Checklist

reneSchm commented 3 months ago

Maybe we can use FetchContent instead of a zip file: https://stackoverflow.com/questions/72913306/how-to-use-boost-libraries-directly-from-github-using-cmake-fetchcontent-or-any

This could also work for the full download as well, by just changing the included libraries. If FetchContent is fast enough, I would prefer it over providing a binary archive. The zip archive obscures the dependency, and as far as I can tell from a quick search the opinion on how to handle dependencies in git(hub) is not to.

Anyways, github does have a solution for large file storage, which we could use for storing an archive.

Further, there is a cache action to reuse dependencies across runs, but I am not sure whether it applies to us, since we do not use a dependency manager like npm.

lenaploetzke commented 3 months ago

Maybe we can use FetchContent instead of a zip file: https://stackoverflow.com/questions/72913306/how-to-use-boost-libraries-directly-from-github-using-cmake-fetchcontent-or-any

This could also work for the full download as well, by just changing the included libraries. If FetchContent is fast enough, I would prefer it over providing a binary archive. The zip archive obscures the dependency, and as far as I can tell from a quick search the opinion on how to handle dependencies in git(hub) is not to.

FetchContent is used in #983 and takes quite a long time

reneSchm commented 3 months ago

I think that version downloads all of boost, in the stackoverflow link they seem to only download some of the boost libraries. It might be worth a try.

mknaranja commented 3 months ago

@reneSchm

dabele commented 1 month ago

I think it would be worth exploring caching boost. there are some open questions though, see https://github.com/SciCompMod/memilio/pull/994#issuecomment-2139102260

In short: boost is huge, and cache space is limited, so unless the cached version can be used for all builds, there is not enough space.

Some projects offer dependencies on a mirror, this would have to be a public server, probably hosted at DLR, would have to talk to IT about that. We could put a minimal boost in different versions there. If we had a server for that we might be able to use it as a remote ccache repository.

reneSchm commented 1 month ago

In short: boost is huge, and cache space is limited, so unless the cached version can be used for all builds, there is not enough space.

As I see it, the biggest problem of the CI regarding build time comes from downloading all of boost for every build. Could we reduce that time by caching a .tar.gz (or .zip) of that download? We then could extract it before the generation step and point cmake to it. The archive is platform independent, and b2 (which will figure out the platform stuff) should run fast enough to keep it in.

dabele commented 1 month ago

As I see it, the biggest problem of the CI regarding build time comes from downloading all of boost for every build. Could we reduce that time by caching a .tar.gz (or .zip) of that download?

If that's the biggest issue, that should be easy to solve with a cache. Are we currently cloning a git repo or downloading an archive? Downloading a tar would probably be easier and more efficient, both because it's faster to download one file than thousands with git, and because extracting the archive to get the source code is faster than compressing it for caching.

reneSchm commented 1 month ago

We are downloading a tagged release:

FetchContent_Declare(boost
    GIT_REPOSITORY https://github.com/boostorg/boost.git
    GIT_TAG boost-${MEMILIO_BOOST_VERSION})

I am pretty sure this just downloads the repo, but FetchContent can use URLs, so we could use the download from the boost homepage: "https://archives.boost.io/release/1.85.0/source/boost_1_85_0.tar.gz"

dabele commented 1 month ago

I just tried FetchContent with URL. It is quite fast, much faster than with a repository. The source code in the archive also has correct include paths, so bootstrapping doesn't seem to be necessary. I would guess it's also less traffic for github. Is there some drawback to downloading the tar or did we just miss that before? I think we don't need to worry about the cache at all then.

reneSchm commented 1 month ago

We might have missed it, but can we still use the version number with URL?

dabele commented 1 month ago

I found this commit (https://github.com/SciCompMod/memilio/commit/52d0303db27b3fc406435be1d305c938b8c6e435) where we switched from archive to repository for Eigen3. But we are still using archive for jsoncpp. So maybe the problem with Eigen3 is just a problem for gitlab or it was temporary.

can we still use the version number with URL?

the archive URL follows a naming scheme (see https://github.com/boostorg/boost/tags) that includes the version number and downloads are available for older tags as well, so that should work.

//Edit: here is the issue regarding eigen3 archive download: https://gitlab.dlr.de/hpc-against-corona/epidemiology/-/issues/467 Not much more info there.