cog-imperial / OMLT

Represent trained machine learning models as Pyomo optimization formulations
Other
281 stars 59 forks source link

Clean up package boilerplate #149

Closed lukasturcani closed 3 months ago

lukasturcani commented 5 months ago

This PR does a couple of things to clean up the boilerplate related to packaging OMLT, see sections below for detailed explanations of the changes.

Other comments

Using pyproject.toml

pyrpoject.toml is the simplest way to provide package metadata for a Python package. It is easy to read and also provides sections for configurating tools such as pytest, ruff and mypy all in one place. It works seamlessly with the modern Python ecosystem.

I set up pyproject.toml to automactically detect the version of the code from git tags. No need to duplicate version numbers across the repo. Just add a new tag and everything will be updated. In addition, when a new git tag is pushed to the GitHub repo, the new publish_release workflow will be triggered and a new PYPI version released. (See more on this below).

I also set it up so that the version is automatically added to a file called src/omlt/_version.py which holds the __version__ variable. this file is autogenerated and therefore added to .gitignore. The __version__ veriable is then re-exported in src/omlt/__init__.py so that our users have access to it.

I tried to perserve all the information stored in the setup.cfg and other deleted files -- let me know if there is something i missed!

Optional dependencies

The pyproject.toml file allows the creation of optional dependencies. For example, our users can install

pip install omlt[keras]
# or
pip install omlt[torch]
# or
pip install omlt[linear-tree,keras-gpu]

Ofc any combination of optional dependencies is valid too. This allows our users to install the dependencies specific to their use case. Note that:

The available optional dependencies are:

Our documentation probably needs to be updated to tell users they wanna install omlt with some combination of linear-tree, keras, keras-gpu, torch optional dependencies depending on what features of the package they are using

Quality checks with ruff, mypy and doctest

I've enabled ruff, mypy and doctest. Currently there are no doctests, but its good to have it set up so that it runs in case any are added in the future.

Both ruff and mypy are failing because there are a number of things which need to fixed. For both ruff and mypy I have disabled some checks which it would be good to enable eventually but are probably a fair amount of work to fix -- these have comments in pyproject.toml. The remaining failing checks are ones which I would reccomend fixing ASAP. There's two approaches, merge now and fix these errors later. Or keep a separate branch where these are incrementally fixed. Up to you to decide what you prefer.

I told ruff to check for google style docstrings. I think these are the best because they have good readbility and work the best with type hints in my opinion.

Using just instead of tox

https://github.com/casey/just is a simple command runner. It allows the developers to define and re-use common operations, for example I can define a check recipe and then run

just check

in my command line and it will run all the tests. The beauty of this is that just is extremely simple. If you read the file its basically a sequence of bash instructions for each recipe. This makes the recipes really transparent, and easy to understand, and works as code-as-documentation. Users can just read the recipe and run the commands one by one to get the same effect without having just installed. There is no magic which helps with debugging issues. It's also language agnostic. just comes as a small stand-alone binary, which makes it a very non-intrusive tool to have on your computer that does not need any dependencies.

The downside is that it does not provide automatic management for Python environments, which I belive tox does provide. The other side of this is that we allow developers to use their favorite tools for managing venvs rather than proscribing certain tools for this repo. (the difference with just being that it is essentially optional tool and also serving as documentation)

I may be overly opinionated on this one, so feel free to push back.

Cleaning up docs/conf.py

I removed a bunch of the commented out code. This makes it easier to see what the configuration is and also prevents the commented out options from becoming out of date when a new release of sphinx is made.

Moving pull_request_template.md

I moved this into the .github folder because it is GitHub configuration. Very optional, but makes more sense to me.

readthedocs automated action

this guide https://docs.readthedocs.io/en/stable/guides/pull-requests.html shows how to set it up. requires admin permissions on readthedocs -- can jump on a call to help with this

publishing with to PYPI with a git tag

for this an API key for PYPI needs to be created and added to the repos secrets -- can jump on a call to help with this

consider _internal package structure

One way to make it easier to manage private vs public code in a repository is to create an _internal folder where all the code goes. This way all code can be shared easily and moved between modules and its by default private, so changes to internal code does not break users. Public modules then just re-export code in the _internal submodules. You can see an example of this structure here https://github.com/lukasturcani/stk. Not a huge issue but I find it very helpful for managing what things are actually exposed to users the code-base grows.

Legal Acknowledgement\ By contributing to this software project, I agree my contributions are submitted under the BSD license. I represent I am authorized to make the contributions and grant the license. If my employer has rights to intellectual property that includes these contributions, I represent that I have received permission to make contributions and grant the required license on behalf of that employer.

codecov[bot] commented 5 months ago

Codecov Report

Attention: Patch coverage is 70.48346% with 116 lines in your changes missing coverage. Please review.

Project coverage is 90.36%. Comparing base (e605638) to head (ecaac06). Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
src/omlt/io/onnx_parser.py 60.43% 51 Missing and 4 partials :warning:
src/omlt/neuralnet/nn_formulation.py 50.00% 18 Missing and 3 partials :warning:
src/omlt/neuralnet/layer.py 54.54% 13 Missing and 2 partials :warning:
src/omlt/gbt/model.py 23.07% 10 Missing :warning:
src/omlt/gbt/gbt_formulation.py 45.45% 6 Missing :warning:
src/omlt/io/keras/keras_reader.py 66.66% 2 Missing and 1 partial :warning:
...c/omlt/io/torch_geometric/build_gnn_formulation.py 50.00% 2 Missing and 1 partial :warning:
src/omlt/formulation.py 33.33% 1 Missing and 1 partial :warning:
src/omlt/io/onnx.py 50.00% 0 Missing and 1 partial :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #149 +/- ## ========================================== - Coverage 92.55% 90.36% -2.20% ========================================== Files 32 32 Lines 1948 2034 +86 Branches 475 471 -4 ========================================== + Hits 1803 1838 +35 - Misses 74 123 +49 - Partials 71 73 +2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.


🚨 Try these New Features:

jalving commented 4 months ago

This all looks great to me. Converting to draft until @jezsadler merges some ruff and mypy fixes in.