bandframework / Taweret

Python package for Bayesian Model Mixing
https://bandframework.github.io/Taweret/
MIT License
8 stars 8 forks source link

Reevaluate package external dependencies #90

Closed jared321 closed 2 months ago

jared321 commented 6 months ago

The package currently declares many external Python package dependencies and indeed many packages are installed with Taweret. However, some of these are likely not true dependencies (i.e, not needed for a minimal, functional, correct installation).

For example, GitHub actions need to test coverage, but users don't. So, no need to install pytest-cov. Similarly, GH actions need to generate the sphinx documentation, but most users won't. If they do, I think that it's reasonable that they install all the sphinx packages manually. Need to figure out what to do for notebook-based examples.

ominusliticus commented 6 months ago

Yes, I have seen certain repositories that have test_requirements.txt files which specify the dependencies that need to be installed to run test. Is that the kind of solution you envision here?

jared321 commented 6 months ago

That might be one way to go. To be clear, I created this Issue to make sure we don't forget about this. I doubt that this is high-priority. Do you agree?

We can think about this in three ways: 1) What is needed for GitHub actions? 2) What is needed for developers? 3) What is needed for users?

In our next meeting, we can go over what I presently have done, which provides a single solution for (1) and (2) and potentially some users.

jared321 commented 3 months ago

I grepped each dependence in pyproject.toml in the package's *.py source files and removed all those that don't appear anywhere.

I suspect that some of the original dependencies were included originally so that each Taweret wheel had all dependencies needed for all tasks (e.g., building sphinx docs, running code coverage). However, users wouldn't generally run many of those and therefore, it makes sense to move them out. This is acceptable in terms of developers because the tox tasks know what extra packages are needed and automatically install those.

In the case that some removed packages were included as dependencies of Taweret's actual dependencies, I propose that we assume that each Taweret dependence correctly specifies its dependences so that pip should automatically install the full stack. If the assumption is valid, then maintaining the list of actual dependencies should be easier this way.

jared321 commented 3 months ago

@asemposki I've advance the work for this issue and have some questions.

The notebook-only dependency, seaborn, is now specified in the book tox tasks so that the GH jupyterbook actions have everything they need. Should the need to manually install seaborn be noted somewhere in the examples portion of the documentation? Or are the examples now really only meant for people to read in the online book?

bilby has many layers of requirements. At the moment, I am just assuming that tox, the actions, and users will use its mandatory dependencies https://git.ligo.org/lscsoft/bilby/-/blob/master/requirements.txt?ref_type=heads which includes some packages used by the notebooks (e.g., corner and pandas).

I didn't find ptemcee being imported anywhere in this repo. However, it appears to be needed by the examples through bilby. It is only one of many optional sampler dependencies https://git.ligo.org/lscsoft/bilby/-/blob/master/sampler_requirements.txt?ref_type=heads

How is Bilby integrated into Taweret? Is the use of ptemcee hardcoded or do users have the ability to specify a particular Bilby-supported sampler?

Based on your responses, how should we manage these sampler dependences? Should we include some or all of the samplers as mandatory dependences in pyproject.toml so that all users install them even if they don't use them? Should the docs tell users to manually install their samplers of choice or to just install all samplers as shown here? https://lscsoft.docs.ligo.org/bilby/samplers.html#installing-samplers In such a case, I would move ptemcee to the book tox tasks.

We'll have to think about this more after the review in terms of ensuring that we catch when a sampler you use is no longer offered in Bilby.

asemposki commented 3 months ago

@jared321 I envisioned the Jupyter Book as a place to link from so users could run those notebooks in their own code space or locally if they wish, so I would think that seaborn is then a local dependence. It could also be completely removed since it is only for colour palettes and plot styling, but I vote we leave it and make it a dependence when running notebooks locally (so we can tell users to install it for local work).

bilby is used by Taweret as the official sampler wrapper, so when you call it, you can specify the sampler type. Dan's notebooks show the various arguments the user can specify externally so that a different sampler, number of walkers, steps, output printing, etc., are chosen. So yes, the idea would be to allow users to use any of the samplers available through bilby, hence this is where ptemcee is included.

I'm not 100% sure which option is best---including all samplers from the start or downloading them as needed---but given that the bilby documentation claims that if a user tries to use one that is not installed, a "helpful message will be printed", that may be enough, meaning that we only use the default dependences of bilby and then users can install more if they want a different sampler. Probably for cleanliness and keeping things as light as possible, this would be the best option.

jared321 commented 2 months ago

@asemposki It sounds reasonable to me that we ask users to manage the optional samplers in their installation since this gives them more freedom and power to manage their needs. It also seems acceptable since using a different sampler requires that the user manually specify the use of that sampler - there should be no surprises.

If I understood correctly, then

Correct?

asemposki commented 2 months ago

Yes, and I will include in those notebooks a commented out !pip install seaborn and !pip install ptemcee so that they can see to install those there to run the examples. It seems reasonable to me to keep dependences to a minimum.

jared321 commented 2 months ago

This is now in main in preparation for the next release. Closed.