Closed teutoburg closed 2 months ago
Attention: Patch coverage is 18.75000%
with 13 lines
in your changes missing coverage. Please review.
Project coverage is 75.58%. Comparing base (
6b2a07c
) to head (e5fc68c
).
Files | Patch % | Lines |
---|---|---|
scopesim/server/example_data_utils.py | 18.75% | 13 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I suppose the idea behind pooch is to put the hash in the code (and thereby in the git commit hash etc.), so you know what file you are going to get for reproducibility (and perhaps security).
If we don't care about reproducibility or security (and perhaps we don't) but only data integrity, then we don't need this extra hash but can instead rely on the checksums that could/should be in the header of the FITS files (e.g. DATAMD5
,CHECKSUM
,DATASUM
).
So I don't see a good reason for having the hashes in a separate file. It doesn't really cause any problems either, so whatever.
But using pooch is obviously better than what we had, since it actually works, so don't pay too much attention to my comment about the hashes.
It seems the download function isn't actually covered by our tests. We could add a test file on the server and cache it to a temp dir in a test (indeed a webtest
), that way it would be downloaded each time...
Should we nevertheless add pooch
to the dependencies in the pyproject.toml
file?
I should read before I type
Goes for me as well it seems. Pooch already was in our pyproject.toml
dependencies, probably for the raccoon face thing. Oops.
Anyway, after reading through their changelog, I think it's worth upgrading to the latest version, so I did.
Oops, now the METIS notebooks cannot find the downloaded files anymore: https://github.com/AstarVienna/ScopeSim_Data/actions/runs/10233670841/job/28312246596
FileNotFoundError Traceback (most recent call last)
Cell In[6], line 1
----> 1 input_hdul = fits.open("HL_Tau_prep_for_Scopesim.fits")
Apparently the notebooks expects the downloaded file to be present in the current working directory.
Should the notebooks be fixed and use the returned paths, or should download function place copies/symlinks in the current working directory?
I think we should update the notebooks.
I should read before I type
Goes for me as well it seems. Pooch already was in our
pyproject.toml
dependencies, probably for the raccoon face thing. Oops.Anyway, after reading through their changelog, I think it's worth upgrading to the latest version, so I did.
Did this perhaps break the windows update-to-latest-dependencies test? https://github.com/AstarVienna/ScopeSim/actions/runs/10233695289/job/28316098713
Did this perhaps break the windows update-to-latest-dependencies test? https://github.com/AstarVienna/ScopeSim/actions/runs/10233695289/job/28316098713
Running poetry update
locally on Win11 Py3.11 works fine (from current main
).
However, on my Win10 machine that I recently upgraded from Py3.9 to 3.12, I get weird errors:
poetry install
(from lock file) fails around pyerfa
. Both on main
and on 6b2a07c
, before the recent changes. Seems to be a problem on that machine caused by the 3.9->3.12 update...poetry update
solves the pyerfa
, but I get an error from matplotlib instead. Attempting to just update this pyerfa, it now fails on synphot. I think I broke my installation here 😑 However, on my Win10 machine that I recently upgraded from Py3.9 to 3.12, I get weird errors:
- Simple
poetry install
(from lock file) fails aroundpyerfa
. Both onmain
and on6b2a07c
, before the recent changes. Seems to be a problem on that machine caused by the 3.9->3.12 update...- Running
poetry update
solves thepyerfa
, but I get an error from matplotlib instead. Attempting to just update this pyerfa, it now fails on synphot. I think I broke my installation here 😑
This should be solved by #450
~TLDR~ Summary
The astropy-based download and cache functionality previously used here led to many failed runs in ScopeSim_Data. This should hopefully improve now with
pooch
, which among other applications is specifically advertised for use on "example data". It doesn't really add an additional dependency, because we're already using pooch as part ofspextra
, and (at least in testing) also through SciPy (the raccoon face image).On the
download_example_data
functionThere is an argument to be made about removing the optional
url
argument, because we can change the server URL through the cmds anyway (hopefully we don't have to...) and it's unlikely anyone would ever use this function to download different data from somewhere else. But I'm also fine with keeping it.I changed the default cache location to
~.astar/scopesim/
, in line with e.g.skycalc_ipy
. Caching should also work better now withpooch
, see below.On hashes and caches
Paraphrasing from their documentation,
pooch
uses hashes to determine if the requested file on the server is different from the one stored in the cache and thus whether it needs to be downloaded. In all setups that are not "quick and dirty script mode", it refuses to download files without a known hash provided. In theory,pooch
also supports (and encourages the use of) versioning, but I think that only works when the data is stored on a remote repo, e.g. GitHub, which we're explicitly avoiding because of file sizes involved.Storing the hashes in a separate file is described in their docs, although I suppose we're still well below having a "large number of files", so we might as well put the hashes in a dict. I don't think it'll make a practical difference. If we modify or add files in the future, we can simply change or add the corresponding hash and everything should work just fine.
Now I'm not sure how the caching behavior affects ScopeSim_Data, but we'll see...