fmaussion / salem

Add geolocalised subsetting, masking, and plotting operations to xarray
http://salem.readthedocs.io
Other
184 stars 43 forks source link

Do not download sample data #196

Open mullenkamp opened 3 years ago

mullenkamp commented 3 years ago

Hi,

I was wondering if there was an easy way to turn off the automatic downloading of the sample data at initial installation and import? I'm running a regular process in docker and I don't want to download 50 MB everytime it runs. I've been looking through the code, but I'm really struggling to find the spot where it's actually running the download_demo_files.

Thanks, Mike

fmaussion commented 3 years ago

Hi, thanks for the feedback.

The download happens at first import of salem.graphics, not at installation.

Unfortunately in most of the cases salem needs some of these files to run (for example the map country borders are in there and other files). Arguably, we could separate between the files that salem really needs and the files that salem needs for tests only. But this is going to require some thinking on our side.

There is one thing you can do though: download the salem files during the docker build process (this is what we do for our Binder images for example: https://github.com/OGGM/r2d/blob/master/binder/download_cache.py). But of course your docker image is going to be larger.

@TimoRoth this is something we need to discuss

mullenkamp commented 3 years ago

Yeah, I understand. I figured that would be the case. Thanks for the reply. And that is a good work-around to load the salem files during the docker build process. Just out of my own curiosity I went through the salem modules to see what is required to run open_wrf_dataset. This seems to be most of the sio and wrftools modules and some from gis and utils.

fmaussion commented 3 years ago

Yes if one does not plot anything, the test files are probably unnecessary

okhoma commented 3 years ago

I too would prefer not having the extra files downloaded that I will not use during my processing. It would be great if the plotting code and samples have been separate from the backend stuff, so that when I do not need the files, they were not downloaded. Or, at least, there was an environment variable to turn it off in the expense of error if the files are needed but are not there.