conda-forge / gmprocess-feedstock

A conda-smithy repository for gmprocess.
BSD 3-Clause "New" or "Revised" License
1 stars 7 forks source link

Missing package data #78

Closed emthompson-usgs closed 1 year ago

emthompson-usgs commented 1 year ago

Solution to issue cannot be found in the documentation.

Issue

Hi @ocefpaf, I got an error report from a user via email and it was clear that the cause was that a file in the data subpackage wasn't being found. Note, we recently switched from primarily using setup.py to pyproject.toml for setup options and when we did this it didn't seem like it was necessary to specify package data, but surely this is where something has gone wrong.

The most minimal way I found to recreate the error is with

$ gmrecords processing_steps

which raises an error because it can't find ~/miniconda3/envs/build/lib/python3.10/site-packages/gmprocess/data/config_production.yml.

I was not able to reproduce the error when I install from source or via pip, but I was able to reproduce it when I install from conda. I confirmed that the contents of src/gmprocess/data is missing from the install directory. Here's my code to reproduce:

$ conda create --name build pip gmprocess python=3.10
$ tree ~/miniconda3/envs/build/lib/python3.10/site-packages/gmprocess/data
/Users/emthompson/miniconda3/envs/build/lib/python3.10/site-packages/gmprocess/data
├── __init__.py
└── __pycache__
    └── __init__.cpython-310.pyc
1 directory, 2 files

I'm guessing we need to specify something in pyproject.toml or in the feedstock recipe to indicate that we want the data files in this directory to be included. I'm hoping you can point me in the right direction. Thanks.

Installed packages

asgiref                   3.6.0              pyhd8ed1ab_0    conda-forge
atk-1.0                   2.38.0               h1d18e73_1    conda-forge
attrs                     22.2.0             pyh71513ae_0    conda-forge
backports.zoneinfo        0.2.1           py310h2ec42d9_7    conda-forge
beautifulsoup4            4.11.1             pyha770c72_0    conda-forge
blosc                     1.21.2               hebb52c4_0    conda-forge
boost-cpp                 1.78.0               h8b082ac_1    conda-forge
branca                    0.6.0              pyhd8ed1ab_0    conda-forge
brotli                    1.0.9                hb7f2c08_8    conda-forge
brotli-bin                1.0.9                hb7f2c08_8    conda-forge
brotlipy                  0.7.0           py310h90acd4f_1005    conda-forge
bs4                       4.11.1               hd8ed1ab_0    conda-forge
bzip2                     1.0.8                h0d85af4_4    conda-forge
c-ares                    1.18.1               h0d85af4_0    conda-forge
ca-certificates           2022.12.7            h033912b_0    conda-forge
cached-property           1.5.2                hd8ed1ab_1    conda-forge
cached_property           1.5.2              pyha770c72_1    conda-forge
cairo                     1.16.0            h904041c_1014    conda-forge
certifi                   2022.12.7          pyhd8ed1ab_0    conda-forge
cffi                      1.15.1          py310ha78151a_3    conda-forge
cfitsio                   4.2.0                hd56cc12_0    conda-forge
charset-normalizer        2.1.1              pyhd8ed1ab_0    conda-forge
click                     8.1.3           unix_pyhd8ed1ab_2    conda-forge
click-plugins             1.1.1                      py_0    conda-forge
cligj                     0.7.2              pyhd8ed1ab_1    conda-forge
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
configobj                 5.0.6                      py_0    conda-forge
contextlib2               0.5.5                      py_2    conda-forge
contourpy                 1.0.6           py310ha23aa8a_0    conda-forge
cryptography              38.0.4          py310hdd0c95c_0    conda-forge
curl                      7.87.0               h6df9250_0    conda-forge
cycler                    0.11.0             pyhd8ed1ab_0    conda-forge
decorator                 5.1.1              pyhd8ed1ab_0    conda-forge
defusedxml                0.7.1              pyhd8ed1ab_0    conda-forge
dill                      0.3.6              pyhd8ed1ab_1    conda-forge
django                    4.1.4              pyhd8ed1ab_0    conda-forge
docutils                  0.19            py310h2ec42d9_1    conda-forge
esi-core                  1.0.1           py310h936d966_1    conda-forge
esi-extern-openquake      1.0.2              pyhd8ed1ab_0    conda-forge
esi-utils-colors          1.0.3              pyhd8ed1ab_0    conda-forge
esi-utils-io              1.0.2              pyhd8ed1ab_0    conda-forge
esi-utils-rupture         1.0.2              pyhd8ed1ab_0    conda-forge
esi-utils-time            1.0.2              pyhd8ed1ab_0    conda-forge
esi-utils-vectors         1.0.2              pyhd8ed1ab_0    conda-forge
et_xmlfile                1.0.1                   py_1001    conda-forge
expat                     2.5.0                hf0c8a7f_0    conda-forge
fiona                     1.8.22          py310h3963e5c_5    conda-forge
folium                    0.14.0             pyhd8ed1ab_0    conda-forge
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 hab24e00_0    conda-forge
fontconfig                2.14.1               h5bb23bf_0    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
fonttools                 4.38.0          py310h90acd4f_1    conda-forge
freetype                  2.12.1               h3f81eb7_1    conda-forge
freexl                    1.0.6                hb7f2c08_1    conda-forge
fribidi                   1.0.10               hbcb3906_0    conda-forge
gdal                      3.6.1           py310h5abc6fc_1    conda-forge
gdk-pixbuf                2.42.8               h3648f77_1    conda-forge
geos                      3.11.1               hf0c8a7f_0    conda-forge
geotiff                   1.7.1                he29fd1c_4    conda-forge
gettext                   0.21.1               h8a4c099_0    conda-forge
giflib                    5.2.1                hbcb3906_2    conda-forge
gmprocess                 1.2.2              pyhd8ed1ab_1    conda-forge
graphite2                 1.3.13            h2e338ed_1001    conda-forge
graphviz                  6.0.2                hc51f7b9_0    conda-forge
greenlet                  2.0.1           py310h7a76584_0    conda-forge
gtk2                      2.24.33              h7c1209e_2    conda-forge
gts                       0.7.6                hccb3bdf_2    conda-forge
h5py                      3.7.0           nompi_py310h5555e59_102    conda-forge
harfbuzz                  6.0.0                h08f8713_0    conda-forge
hdf4                      4.2.15               h7aa5921_5    conda-forge
hdf5                      1.12.2          nompi_h48135f9_101    conda-forge
html5lib                  1.1                pyh9f0ad1d_0    conda-forge
icu                       70.1                 h96cf925_0    conda-forge
idna                      3.4                pyhd8ed1ab_0    conda-forge
importlib-metadata        5.2.0              pyha770c72_0    conda-forge
isodate                   0.6.1              pyhd8ed1ab_0    conda-forge
jinja2                    3.1.2              pyhd8ed1ab_1    conda-forge
jpeg                      9e                   hac89ed1_2    conda-forge
json-c                    0.16                 h01d06f9_0    conda-forge
kealib                    1.5.0                h5c1f988_0    conda-forge
kiwisolver                1.4.4           py310ha23aa8a_1    conda-forge
krb5                      1.20.1               h049b76e_0    conda-forge
lcms2                     2.14                 h90f4b2a_0    conda-forge
lerc                      4.0.0                hb486fe8_0    conda-forge
libaec                    1.0.6                he49afe7_0    conda-forge
libblas                   3.9.0           16_osx64_openblas    conda-forge
libbrotlicommon           1.0.9                hb7f2c08_8    conda-forge
libbrotlidec              1.0.9                hb7f2c08_8    conda-forge
libbrotlienc              1.0.9                hb7f2c08_8    conda-forge
libcblas                  3.9.0           16_osx64_openblas    conda-forge
libcurl                   7.87.0               h6df9250_0    conda-forge
libcxx                    14.0.6               hccf4f1f_0    conda-forge
libdeflate                1.14                 hb7f2c08_0    conda-forge
libedit                   3.1.20191231         h0678c8f_2    conda-forge
libev                     4.33                 haf1e3a3_1    conda-forge
libffi                    3.4.2                h0d85af4_5    conda-forge
libgd                     2.3.3                h1e214de_3    conda-forge
libgdal                   3.6.1                hd928027_1    conda-forge
libgfortran               5.0.0           11_3_0_h97931a8_27    conda-forge
libgfortran5              11.3.0              h082f757_27    conda-forge
libglib                   2.74.1               h4c723e1_1    conda-forge
libiconv                  1.17                 hac89ed1_0    conda-forge
libkml                    1.3.0             haeb80ef_1015    conda-forge
liblapack                 3.9.0           16_osx64_openblas    conda-forge
libnetcdf                 4.8.1           nompi_hc61b76e_106    conda-forge
libnghttp2                1.47.0               h5aae05b_1    conda-forge
libopenblas               0.3.21          openmp_h429af6e_3    conda-forge
libpng                    1.6.39               ha978bb4_0    conda-forge
libpq                     15.1                 h3640bf0_2    conda-forge
librsvg                   2.54.4               h3d48ba6_0    conda-forge
librttopo                 1.1.0               h9461dca_12    conda-forge
libsodium                 1.0.18               hbcb3906_1    conda-forge
libspatialindex           1.9.3                he49afe7_4    conda-forge
libspatialite             5.0.1               hc1c2c66_22    conda-forge
libsqlite                 3.40.0               ha978bb4_0    conda-forge
libssh2                   1.10.0               h47af595_3    conda-forge
libtiff                   4.4.0                h6268bbc_5    conda-forge
libtool                   2.4.6             he49afe7_1008    conda-forge
libwebp                   1.2.4                hfa4350a_0    conda-forge
libwebp-base              1.2.4                h775f41a_0    conda-forge
libxcb                    1.13              h0d85af4_1004    conda-forge
libxml2                   2.10.3               hb9e07b5_0    conda-forge
libxslt                   1.1.37               h5d22bc9_0    conda-forge
libzip                    1.9.2                h6db710c_1    conda-forge
libzlib                   1.2.13               hfd90126_4    conda-forge
llvm-openmp               15.0.6               h61d9ccf_0    conda-forge
lxml                      4.9.2           py310h0b20c97_0    conda-forge
lz4-c                     1.9.3                he49afe7_1    conda-forge
markupsafe                2.1.1           py310h90acd4f_2    conda-forge
matplotlib-base           3.6.2           py310he725631_0    conda-forge
munch                     2.5.0                      py_0    conda-forge
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
ncurses                   6.3                  h96cf925_1    conda-forge
networkx                  2.8.8              pyhd8ed1ab_0    conda-forge
nspr                      4.35                 hea0b92c_0    conda-forge
nss                       3.78                 ha8197d3_0    conda-forge
numpy                     1.24.0          py310h1b7c290_0    conda-forge
obspy                     1.4.0           py310h936d966_0    conda-forge
openjpeg                  2.5.0                h5d0d7b0_1    conda-forge
openpyxl                  3.0.10          py310h90acd4f_2    conda-forge
openquake.engine          3.15.0             pyhd8ed1ab_0    conda-forge
openssl                   3.0.7                hfd90126_1    conda-forge
packaging                 22.0               pyhd8ed1ab_0    conda-forge
pandas                    1.5.2           py310hecf8f37_0    conda-forge
pango                     1.50.12              hbd9bf65_1    conda-forge
patsy                     0.5.3              pyhd8ed1ab_0    conda-forge
pcre2                     10.40                h1c4e4bc_0    conda-forge
pillow                    9.2.0           py310hffcf78b_3    conda-forge
pip                       22.3.1             pyhd8ed1ab_0    conda-forge
pixman                    0.40.0               hbcb3906_0    conda-forge
poppler                   22.12.0              hf2ff1a1_0    conda-forge
poppler-data              0.4.11               hd8ed1ab_0    conda-forge
postgresql                15.1                 hbea33b9_2    conda-forge
proj                      9.1.0                hcbd9701_0    conda-forge
prov                      2.0.0              pyhd3deb0d_0    conda-forge
ps2ff                     1.5.6              pyhd8ed1ab_0    conda-forge
psutil                    5.9.4           py310h90acd4f_0    conda-forge
pthread-stubs             0.4               hc929b4f_1001    conda-forge
pyasdf                    0.7.3                      py_0    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pydot                     1.4.2           py310h2ec42d9_3    conda-forge
pyopenssl                 22.1.0             pyhd8ed1ab_0    conda-forge
pyparsing                 3.0.9              pyhd8ed1ab_0    conda-forge
pyproj                    3.4.1           py310h8c678d5_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.10.8          he7542f4_0_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python_abi                3.10                    3_cp310    conda-forge
pytz                      2022.7             pyhd8ed1ab_0    conda-forge
pyzmq                     24.0.1          py310hf615a82_1    conda-forge
rdflib                    6.2.0              pyhd8ed1ab_0    conda-forge
readline                  8.1.2                h3899abd_0    conda-forge
requests                  2.28.1             pyhd8ed1ab_1    conda-forge
rtree                     1.0.1           py310had9ce37_1    conda-forge
ruamel.yaml               0.17.21         py310h90acd4f_2    conda-forge
ruamel.yaml.clib          0.2.7           py310h90acd4f_1    conda-forge
schema                    0.7.5              pyhd8ed1ab_0    conda-forge
scipy                     1.9.3           py310h240c617_2    conda-forge
setuptools                65.6.3             pyhd8ed1ab_0    conda-forge
setuptools-scm            7.1.0              pyhd8ed1ab_0    conda-forge
shapely                   2.0.0           py310h4e43f2a_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
snappy                    1.1.9                h225ccf5_2    conda-forge
soupsieve                 2.3.2.post1        pyhd8ed1ab_0    conda-forge
sqlalchemy                1.4.45          py310h90acd4f_0    conda-forge
sqlite                    3.40.0               h9ae0607_0    conda-forge
sqlparse                  0.4.3              pyhd8ed1ab_0    conda-forge
statsmodels               0.13.5          py310h936d966_2    conda-forge
tiledb                    2.13.0               h8b9cbf0_1    conda-forge
tk                        8.6.12               h5dbffcc_0    conda-forge
toml                      0.10.2             pyhd8ed1ab_0    conda-forge
tomli                     2.0.1              pyhd8ed1ab_0    conda-forge
typing-extensions         4.4.0                hd8ed1ab_0    conda-forge
typing_extensions         4.4.0              pyha770c72_0    conda-forge
tzcode                    2022g                hb7f2c08_0    conda-forge
tzdata                    2022g                h191b570_0    conda-forge
unicodedata2              15.0.0          py310h90acd4f_0    conda-forge
urllib3                   1.26.13            pyhd8ed1ab_0    conda-forge
webencodings              0.5.1                      py_1    conda-forge
wheel                     0.38.4             pyhd8ed1ab_0    conda-forge
xerces-c                  3.2.4                h2007e90_1    conda-forge
xlrd                      2.0.1              pyhd8ed1ab_3    conda-forge
xorg-libxau               1.0.9                h35c211d_0    conda-forge
xorg-libxdmcp             1.1.3                h35c211d_0    conda-forge
xz                        5.2.6                h775f41a_0    conda-forge
zeromq                    4.3.4                he49afe7_1    conda-forge
zipp                      3.11.0             pyhd8ed1ab_0    conda-forge
zlib                      1.2.13               hfd90126_4    conda-forge
zstd                      1.5.2                hfa58983_4    conda-forge

Environment info

active environment : build
    active env location : /Users/emthompson/miniconda3/envs/build
            shell level : 2
       user config file : /Users/emthompson/.condarc
 populated config files : /Users/emthompson/.condarc
          conda version : 22.9.0
    conda-build version : 3.23.1
         python version : 3.9.13.final.0
       virtual packages : __osx=11.7.1=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : /Users/emthompson/miniconda3  (writable)
      conda av data dir : /Users/emthompson/miniconda3/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/osx-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://repo.anaconda.com/pkgs/main/osx-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/osx-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /Users/emthompson/miniconda3/pkgs
                          /Users/emthompson/.conda/pkgs
       envs directories : /Users/emthompson/miniconda3/envs
                          /Users/emthompson/.conda/envs
               platform : osx-64
             user-agent : conda/22.9.0 requests/2.28.1 CPython/3.9.13 Darwin/20.6.0 OSX/11.7.1
                UID:GID : 1312689:346589396
             netrc file : None
           offline mode : False
ocefpaf commented 1 year ago

The source is not available on PyPI and this package is being built using: https://code.usgs.gov/ghsc/esi/groundmotion-processing/-/archive/v1.2.2/groundmotion-processing-v1.2.2.tar.gz

That is a ginormous download BTW, 125 MB! The Wheel on PyPI is ~25MB, which is already quite big but still much smaller than this source. It would be nice if a source distribution was published on PyPI along side the wheel and if that was a bit smaller, with just the files required to build the package. With that said, the data files are there but they are not making into the final sdist used to build the conda package. If you run:

 python -m build --sdist . --outdir dist

as per your pyproject.toml you'll get an empty data directory,

gmprocess-1.2.2/src/gmprocess/data/
gmprocess-1.2.2/src/gmprocess/data/__init__.py

maybe more files are missing but I'm not familiar with the package to know. You are using setuptools so you may fix that by using this https://setuptools.pypa.io/en/stable/userguide/datafiles.html [1].

Note that you are not building your wheel with the standards in your pyproject.toml! If you were using build and the metadata there you would get an empty data directory too! (You can try that by downloading that version number source and typing: python -m build --wheel . --outdir dist).

TL;DR it is a problem upstream and you can fix it with [1] and/or adding a MANIFEST.in file.

emthompson-usgs commented 1 year ago

Thanks for looking at this. The source distribution gets rejected by pypi because of the size. The issue is that it includes the test data, whereas the wheel does not. The only way I can think of to fix the size would be to put the test data somewhere else.

I had been building the wheel with

python -m build

So I don't know why the data contents were included (leading me to think everything was okay in this regard)

$ unzip -l dist/gmprocess-1.2.3.dev0-py3-none-any.whl | grep gmprocess/data
      616  08-15-2022 15:45   gmprocess/data/CESMD_NGA_ids.csv
    41885  08-15-2022 15:45   gmprocess/data/GDMSstations.json
  1122213  08-15-2022 15:45   gmprocess/data/NGA_West2_SiteDatabase_V032.csv
        0  08-15-2022 15:45   gmprocess/data/__init__.py
    18992  12-22-2022 21:23   gmprocess/data/config_production.yml
    25699  11-05-2022 22:57   gmprocess/data/config_test.yml
<snip>

My prior reading of the setuptools page that you linked to made me think that when using pyproject.toml, data files were included by default and didn't require additional specification. It sounds like I'll have to re-read it more carefully.

emthompson-usgs commented 1 year ago

Quick update: I ran the same command you did to get the source distribution:

python -m build --sdist . --outdir dist

But I don't get an empty data directory:

$ tar -tvf dist/gmprocess-1.2.3.dev0.tar.gz  | grep gmprocess/data
drwxr-xr-x  0 emthompson 176539137      0 Dec 23 12:06 gmprocess-1.2.3.dev0/src/gmprocess/data/
-rw-r--r--  0 emthompson 176539137    616 Aug 15 09:45 gmprocess-1.2.3.dev0/src/gmprocess/data/CESMD_NGA_ids.csv
-rw-r--r--  0 emthompson 176539137  41885 Aug 15 09:45 gmprocess-1.2.3.dev0/src/gmprocess/data/GDMSstations.json
-rw-r--r--  0 emthompson 176539137 1122213 Aug 15 09:45 gmprocess-1.2.3.dev0/src/gmprocess/data/NGA_West2_SiteDatabase_V032.csv
-rw-r--r--  0 emthompson 176539137       0 Aug 15 09:45 gmprocess-1.2.3.dev0/src/gmprocess/data/__init__.py
drwxr-xr-x  0 emthompson 176539137       0 Dec 23 12:06 gmprocess-1.2.3.dev0/src/gmprocess/data/asdf/
<snip>

So I'm wondering why there is this difference in behavior in my install. Could it be the version of setuptools or build? Here's what I have:

build                     0.9.0                    pypi_0    pypi
setuptools                65.6.3             pyhd8ed1ab_0    conda-forge
emthompson-usgs commented 1 year ago

Also, I will work on making the source distribution smaller. Some of these data files definitely don't need to be there, and I can also exclude the tests and docs directories which should shave off a ton of space.

emthompson-usgs commented 1 year ago

For the latest release (1.2.3) the pypi source and wheel distributions are now much smaller (~5.6 MB) and so the source distribution is not rejected by pypi. It occurs to me now that the source url in the recipe/meta.yml file points to the code.usgs.gov tar.gz of the source, which is still large since that is simply a tar of the repo and not the result of python -m build. So I am thinking I'll change the URL to point to the pypi-hosted source distribution.

ocefpaf commented 1 year ago

put the test data somewhere else.

Yep most projects serve the data on GH and have a script to download it at test time [1]. Other strategies may be, if possible, to auto-generate the test data.

[1] one pattern I like is to use pooch to fetch it. See https://github.com/Unidata/MetPy/blob/6f62696b9a1bb338a32ad0a8b801087941d5cc43/src/metpy/cbook.py#L32 for an example.

So I am thinking I'll change the URL to point to the pypi-hosted source distribution.

:+1:

emthompson-usgs commented 1 year ago

Solved with this update. Thanks @ocefpaf!