bandframework / Taweret

Python package for Bayesian Model Mixing
https://bandframework.github.io/Taweret/
MIT License
6 stars 8 forks source link

Taweret/OpenBTMixing Building & Distribution #83

Open jared321 opened 2 months ago

jared321 commented 2 months ago

The simplest means for users to install Taweret would be by issuing pip install Taweret with that command installing openbtmixing automatically from PyPI. However, we need to determine if such a scheme is feasible given openbtmixing's dependence on MPI. As part of this, we can try to determine all the ways in which users can install and use both of these packages.

Possible Requirements

ominusliticus commented 2 months ago

Let's begin by noting that Taweret is intended to be a framework that standardizes APIs for Bayesian inference software (developed by the nuclear physics community). Such a framework should not depend on an implementation thereof. With this reasoning, I would like to strike the ability to blanket-install Taweret and have it take care of all its implementations' requirements.

Assuming the openbtmixing does gain widespread adoption, we would, at a minimum, want to support compute cluster SPACK-installing OpenBT and openbtmixing in whatever, optimization configurations they can concoct.

Build OpenBT as standalone

For this reason, it is, in my opinion, a little inappropriate to discuss the guts of OpenBT in a Taweret issue. But, this will make for the best place to keep track of our ideas for now.

The work flow that I envision is as follows:

  1. Require the user to build OpenBT from scratch
    1. Users should have their compiler and MPI implementation of choice
    2. Clear build instructions should be developed, best practices, e.g., manipulating OpenMPI install locations, should be included
    3. Unit tests for the matrix of compilers and operating systems, ideally covering all conceivable combinations, should be implemented
  2. Require the user to install openbtmixing separately, with the appropriate prompt from Taweret should the module not be installed
    1. This is how it is done in packages like bilby

openbtmixing and OpenBT as Taweret dependancies

Should people insist on the convenience of one-line installations, we could appeal to the pip install flag --no-binary :all: which should be tailored to build dependencies with exactly one prescription for compiler and MPI implementation, whose identities should be determined by compilers available on systems like Ubuntu (personal computer) and CentOS (clusters).

jared321 commented 2 months ago

@ominusliticus Interesting. I've never heard of a package building in a dependency but refusing to install it. That does sound like something we might need. As long as the error message is helpful and points them to clear docs, it seems acceptable.

It seems like much of your workflow is in accord with my requirements. In other words, it could be a solution that satisfies the requirements. Do you agree?

I've realized that while we are trying to evolve the package distribution portion of OpenBT as you point out, we are at the same time updating the overall Taweret software architecture in the case that OpenBTMixing is a dependency so that the architecture is compatible with an acceptable build/distribution strategy. We've seen this in the work that John is doing and my next question is related.

There is the "homebrew-style" architecture where the OpenBT/C++ CLTs and the OpenBTMixing Python package are independent layers in the SW architecture. In such case, the Python package is a pure Python package. However, there is a "Python source build" architecture where the OpenBTMixing Python package is a true wrapper of the OpenBT/C++ CLTs. I believe that this style implies that the package must have its own build system and users would likely always have to build from source. Is one of these what you have in mind for your workflow?

image

ominusliticus commented 2 months ago

In my own view of things, I am more inclined towards the figure on the right. The picture on the left is something we have to fit into the "fast-paced data science in python" pipeline. If we can enable this for users, we ought to.

Regarding the discussion of the relevance of OpenBT infrastructure for Taweret, I agree. Though the results of our findings/conclusion will probably be documents in the BAND software guidelines, which will be useful for future developers.

asemposki commented 2 months ago

In regard to Anaconda, here are the things I have found out so far:

Update: