cmu-delphi / delphi-epidata

An open API for epidemiological data.
https://cmu-delphi.github.io/delphi-epidata/
MIT License
100 stars 68 forks source link

Consider standardizing Python client packaging #1468

Open dshemetov opened 4 months ago

dshemetov commented 4 months ago

Background

The Python client delphi-epidata.py has a peculiar file layout:

packaging/
  pypi/
    delphi_epidata/
      __init__.py
    .gitignore
    CHANGELOG.md
    LICENSE
    README.md
    setup.py
delphi-epidata.py

I think it's structured this way to allow importing the client in this repo as a module and by outside users via PyPI as an installable package.

The first use case can be seen here:

from delphi.epidata.client.delphi_epidata import Epidata

The second use case requires a PyPI install pip install delphi-epidata and then

from delphi_epidata import Epidata

(Note that as it stands, the package isn't installable with the file layout above because delphi-epidata.py isn't in the right directory. When we prepare the package for release on PyPI, our CI first copies the delphi-epidata.py file into the packaging/pypi/delphi_epidata directory, which allows it to be bundled.)

This dual use is handy: the repo code always uses the most recent version of the client code, no installation necessary. And the packaged install is convenient for outside users.

However, with our recent changes here, we fell into a trap: we allowed repo-relative imports to sneak into delphi-epidata.py. Because all our tests rely on repo-relative imports this error snuck by our CI tests, which allowed the standalone client on PyPI to run into import errors.

Proposal

We should always install and import the client as a package. Repo code would need to declare ./src/client/packaging/pypi/ as a local dependency in our requirements files and replace the repo-relative imports with package imports. To make this work, there are two options:

The benefits of the first approach are simplicity, but it suffers from the drawback that other code (not in this repo) uses repo-relative imports; moving this file would break those imports. The benefit of the second approach is keeping that code functional, while the drawback is increased complexity. If the code in those repos relies on the same Docker images, then we'd need to think some more.

With this change in place we would:

A snag to consider: