Reducing `install_requires` to minimum (& expand `extras_require`) and looser version ranges

gmgeorg commented 1 year ago

At quick glance it seems that the current setup.py file is fully exhaustive on all dependencies as an absolute requirement including very specific ranges for versions. If at all possible, it would greatly improve compatibility w/ existing Python repos (and more people being able to use it w/o having to resolve conflicts) if the install_requires was only specifiying absolutely required modules (e.g., plotting or optuna is not really required to use this great package) and the minimum version date (>=) needed, instead of the (approximate) ~= range.

Curious to learn more about whether this package has to be so specific/restrictive on the dependencies (e.g., suggest to use https://stackoverflow.com/questions/10572603/specifying-optional-dependencies-in-pypi-python-setup-py

StatMixedML commented 1 year ago

Thanks for your suggestion.

As of now, model.py among others also imports optuna and the plotting packages. So all the requirements in the setup.py are required. I go thorugh your provided list of references. Thanks

gmgeorg commented 1 year ago

Here is a better reference to suggest a pip install xgboostlss (minimum requirements) vs pip install xgboostlss[full] or sthg along those lines

https://stackoverflow.com/questions/6237946/optional-dependencies-in-distutils-pip

I looked through the code and yes it is currently the case that the implementation requires optuna, seaborn, matplotlib, pandas, shap (I think those are all the ones I found which are not critical for core functionality of XGBoostLSS.train() / .predict()). My suggestion was to decouple this in order to avoid the requirement. In particular, making plotting/tuning methods (!) of the core class unnecessarily ties them to the core functionality; if instead tuning and plotting would be function based that take an XGBoostLSS object as input, then you can decouple them.

This will also have the benefit that whenever you add nicer/better/more visualizations, users can just update the package and call the new functions on an existing object (loaded from disk for example), and don't have to re-initialize a new class, train / tune it again, just so they can use a new plotting funcationality.

Basically proposal would be sthg along the lines of

remove methods cv(self, ...), hyper_opt(self, ...) from XGBoostLSS object and make them functions cv(xgboostlss_object, ...) and hyper_opt(xgboostlss_object, ...) out of models.py into their own standalone tuning.py submodule. Then models.py does not require import optuna anymore; and if users don't want to use your tuning option, they are not required to import optuna anyway. tuning.py just wont work for them -- which is fine if they have other options.
move methods plot(self, ...), expectile_plot(self, ...) out of models.py into their own visualization.py module (again make them functions, not methods). Then in future if you add a new my_favorite_distribution_plot(xgboostlss_object, ...) function in that module, users can apply it to an existing (!) XGBoostLSS object w/o having to re-initialize/train/tune it again. Also here it would remove all the plotting (and even the pandas dependency AFAICT) from models.py.

This would accomplish the outcome that the core functionality in xgboostlss -- which AFAIU is training & predicting -- only depends on absolute minimum requirements; but any addon helper functions (like tuning or plotting) you provide are optional for installation/use. For handling the case when dependencies are not available you can then do sthg like here to let users know they need to install xgboostlss[full] to be able to access the tuning/plotting functionality.

fkiraly commented 8 months ago

I would agree with you, @gmgeorg.

Wider version ranges ensure usability, whereas frequent bumps of lower bounds prevent compatibility with a wider range of systems.

Generally, I would strongly advise the same, separating out the tuning from the raw model - like in skpro where you have probabilistic tuners separate from the "atomic" regression models.

I would be happy to help with a little refactoring of this package, possibly copying some of the more general visualization and tuning logic to skpro? If @StatMixedML is ok with this, would you be up for working on this together, @gmgeorg?

StatMixedML / XGBoostLSS

Reducing `install_requires` to minimum (& expand `extras_require`) and looser version ranges #56