euroargodev / argopy

A python library for Argo data beginners and experts
https://argopy.readthedocs.io
European Union Public License 1.2
186 stars 41 forks source link

PyPI source distribution does not include the test data #381

Open bcbnz opened 2 months ago

bcbnz commented 2 months ago

I was trying to build a distribution package from the 0.1.16 source tarball on PyPI. Running the tests gave the following error during startup:

argopy/tests/conftest.py:8: in <module>
    from mocked_http import mocked_httpserver
argopy/tests/helpers/mocked_http.py:101: in <module>
    class HTTPTestHandler(BaseHTTPRequestHandler):
argopy/tests/helpers/mocked_http.py:103: in HTTPTestHandler
    "": get_html_landing_page(),
argopy/tests/helpers/mocked_http.py:92: in get_html_landing_page
    html.append("<h1>Mocked HTTP server is up and running, serving %i files</h1>" % len(URI))
E   NameError: name 'URI' is not defined

Working back through the code, I found the underlying problem was that argopy/tests/test_data is not included in the PyPI source tarball. Switching to the tarball that GitHib provides from the 0.1.16 tag works as that includes the test data.

gmaze commented 2 months ago

Hi @bcbnz thanks for raising the issue and I'm glad you found a way to fix it

indeed we exclude the test data from the pypi distribution because it is about 420Mb in size, while the entire code only is about a few Mb, and we don't want to clutter the package with data that won't be used by most installers

if you have some indications with regards to:

thanks for your feedbacks !

bcbnz commented 2 months ago

Hi @gmaze

In my case this is for an Arch AUR package which allows Arch users to build a package which can then be installed system-wide by the system package manager. Standard practice for Python packages is to get the source code, build a wheel locally, test it and then generate an Arch package from it. Where possible, I like to use sources from PyPI because (a) they have a checksum of the uploaded file available and (b) GitHub has changed the way they generate archives in the past, which means the checksum changes even if the contents are identical, which breaks packaging. (Note that the current Arch system version of xarray is not compatible with NumPy 2.0 so I am advising users to use a virtual environment if that causes them problems -- for use in developing other projects this should be preferred anyway, but sometimes it is useful to have a system-wide install).

I can think of three options to avoid this issue:

  1. Include the test data in the source distribution (note I am not suggesting the PyPI wheel should include the test data, just the .tar.gz source). Yes, it makes it bigger (note the GitHub tarball is 144 MB so I guess the test data compresses well) but this will only impact people building a local copy -- pip will use the wheel from PyPI unless instructed otherwise, and other tools like conda rely on their own prebuilt packages which should also exclude this data.
  2. Make a download of the test data easily available somewhere. This could be done with each release by uploading a .tar.gz or similar with the data to the GitHub release page. A separate repository (which could be included into the main repository as a submodule) could be easier as it avoids the need to keep uploading new versions of the test data. As part of this, modify argopy/tests/helpers/mocked_http.py to check if the data is available, and if not raise a RuntimeError or similar with instructions of how to get the test data to make the problem clear (it took some investigation to figure out what was causing the error in my original report as it is not obvious).
  3. Remove the tests entirely from the source uploaded to PyPI. If people building from source want to test the package they generate, they will note the lack of tests and look for another source which includes them, such as the GitHub tarball or a direct clone of the repository. This is my least preferred option, but in my opinion this is better than having tests which don't run with a non-obvious error which requires the user to try and debug the problem.
gmaze commented 2 months ago

Ok I understand better the use case, thanks for the explanation and your propositions !