conda-forge / esmpy-feedstock

A conda-smithy repository for esmpy.
BSD 3-Clause "New" or "Revised" License
6 stars 15 forks source link

Remove test data from esmpy package #79

Closed xylar closed 1 year ago

xylar commented 1 year ago

Checklist

conda-forge-webservices[bot] commented 1 year ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

xylar commented 1 year ago

@conda-forge-admin, please rerender

github-actions[bot] commented 1 year ago

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you, but it looks like there was nothing to do.

This message was generated by GitHub actions workflow run https://github.com/conda-forge/esmpy-feedstock/actions/runs/6034824133.

xylar commented 1 year ago

I'll continue with this once #81 is resolved.

xylar commented 1 year ago

This worked great in a local test build! The package is now only 2 MB in size.

xylar commented 1 year ago

If no one has concerns, please merge as soon as CI has passed.

xylar commented 1 year ago

I think I'm just going to take the chance and merge this. I think it works and the data isn't something we want to be shipping. Hopefully, you all agree.

zklaus commented 1 year ago

I thought there might be a risk that there was non-test data in the directory, but I confirmed that this is not the case (currently). I also thought that the directory might need to be present even if we remove its contents, but I confirmed that the download routines create the directory if it is missing.

zklaus commented 1 year ago

Thanks, @xylar and @mx-moth for reporting it. Great improvement of the package.

xylar commented 1 year ago

I would still prefer to use the option of ${ESMPY_DATA_DIR} mentioned in #78 but I couldn't get that to work, and I don't know if that's just because I don't understand it or because it's really broken. I'll revisit when I hear back from the ESMF folks.

billsacks commented 1 year ago

I spent some time looking into this today.

First, I agree that the change in this PR (removing the data directory) is a fine workaround for now: I agree that it looks like this is just used by tests and examples, so shouldn't be needed in the esmpy installation.

ESMPY_DATA_DIR is used in some of our nightly testing to avoid potentially long download times when we can instead reuse already-existing data. As you found, the current assumption in the code is that, if ESMPY_DATA_DIR is defined, then it is already populated (e.g., by doing a git clone of our test data repo) and so we should save time by not trying to download data. This is a feature (though one that may not be widely useful) rather than a bug, but I also agree that your use case is worth supporting.

My proposal is to add a new environment variable (I suggest ESMPY_DATA_NEW_DIR, but I'm open to suggestions) that has the behavior that I think you expect: If this new variable is defined (and ESMPY_DATA_DIR is not defined) then that location will be used for the data download rather than the default location. From looking through the code, I think this should be straightforward to implement, so I'll move ahead with that if it sounds good to you.

xylar commented 1 year ago

@billsacks, I very much like your suggested solution and variable name. That sounds great to me!

billsacks commented 1 year ago

I have implemented this in the latest ESMF develop (https://github.com/esmf-org/esmf/commit/cb4b2065ee06180ffac9314d92036778a1132aab). It will be available in the ESMF 8.6 release due out this fall.

xylar commented 1 year ago

Excellent, thank you @billsacks!