Closed marc-white closed 1 week ago
Attention: Patch coverage is 98.83721%
with 1 line
in your changes missing coverage. Please review.
Project coverage is 97.75%. Comparing base (
36b8140
) to head (6cd9393
). Report is 1 commits behind head on main.
Files with missing lines | Patch % | Lines |
---|---|---|
src/access_nri_intake/cli.py | 97.50% | 1 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
OK. What's the best way for us to test this on GADI?
OK. What's the best way for us to test this on GADI?
The way I've done this in the past (e.g. to test #175 updates):
build_all.sh
script, and modify it to point to a dummy directory, a subset of the data sources, etc.The main thing to check would be to copy the existing catalog folders (e.g. v0.1.3) to the dummt directory, create symlinks to them using the date catalog format, and make sure the build process picks those up as previous versions and sets the catalog min
version correctly.
Here is my first try.
I have installed a local version in my tm70 directory
catalog-build --build_base_path /g/data/xp65/admin/intake /g/data/xp65/admin/access-nri-intake-catalog/config/cmip6.yaml
2024-11-07 17:00:28,933 - access_nri_intake.cli - INFO - Adding 'cmip6_fs38' to metacatalog '/g/data/xp65/admin/intake/v2024-11-07/metacatalog.csv'
2024-11-07 17:00:35,879 - access_nri_intake.cli - INFO - Adding 'cmip6_oi10' to metacatalog '/g/data/xp65/admin/intake/v2024-11-07/metacatalog.csv'
Traceback (most recent call last):
File "/g/data/tm70/rb5533/miniforge3/envs/intake/bin/catalog-build", line 8, in <module>
sys.exit(build())
^^^^^^^
File "/g/data/tm70/rb5533/PROJECTS/access-nri-intake-catalog/src/access_nri_intake/cli.py", line 249, in build
yaml_old["sources"]["access_nri"]["parameters"]["version"]["min"]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^
KeyError: 'min'
Is it because the cmip6
config has more than one source?
I have just tried with only one source in cmip6.yml
same pb
Do you have an existing catalog.yaml
in the build directory? If so, it's going to look for a min
value in that, and it appears to be bombing out because it can't find one. The assumption I made is that the first catalog build in the 'live' directory on xp65
(and, therefore, the first build in your dummy area) will be the first catalog.yaml
to appear there, so it will inspect the present directories to compute a min
and max
version instead.
I have tried with an empty folder and I still get the issue...
OK I'll take a look and see what's happening...
Found the problem - the code is building the catalog in one place, but it's looking for the existing catalog (and, it seems, building the catalog.yaml back out to) the live location referred to by the CATALOG_LOCATION
variable. I need to have a quick think about how to rectify that.
@rbeucher OK, check out the latest code version and try again. You'll need to update your test version of the build_all.sh
script to include the new --catalog_base_path
option (see the canonical build_all.sh
in the repo).
The issue was that, while changing the build_base_path
option in the build script lets you stick the catalog data in a different place, the catalog yaml was always being forced into either live location on xp65
, or into your home area if you have a local catalog.yaml
set up. I didn't pick this up because I mocked get_catalog_fp()
in most of the tests, so I never saw the effect.
I've now added an extra option to the build()
function call, as well as some additional supporting changes, that let you set where the catalog.yaml lives. In production, this will be the same place as the catalog data, hence why the canonical build_all.sh
script sets --build_base_path
and --catalog_base_path
to be the same.
Running a build test right now - will comment if I have any issues
@rbeucher It looks like you might have overwritten a default catalog location or similar on Friday? I'm currently getting the following error - are you able to take a look & let me know if the file in the error message is one you recognise?
>>> import intake
>>> intake.cat.access_nri
/g/data/hh5/public/apps/miniconda3/envs/analysis3-24.07/lib/python3.10/importlib/__init__.py:126](https://are.nci.org.au/g/data/hh5/public/apps/miniconda3/envs/analysis3-24.07/lib/python3.10/importlib/__init__.py#line=125): RuntimeWarning: Unable to access a default catalog location. Calling intake.cat.access_nri will not work.
return _bootstrap._gcd_import(name[level:], package, level)
access_nri:
args: {}
description: ''
driver: intake.catalog.base.Catalog
metadata: {}
>>> from access_nri_intake.utils import get_catalog_fp
>>> get_catalog_fp()
'/g/data/xp65/public/apps/access-nri-intake-catalog/catalog.yaml'
>>> intake.open_catalog('/g/data/xp65/public/apps/access-nri-intake-catalog/catalog.yaml').access_nri
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Cell In[9], line 1
----> 1 intake.open_catalog('[/g/data/xp65/public/apps/access-nri-intake-catalog/catalog.yaml](https://are.nci.org.au/g/data/xp65/public/apps/access-nri-intake-catalog/catalog.yaml)').access_nri
...
FileNotFoundError: [Errno 2] No such file or directory: '[/g/data/tm70/rb5533/intake_tests/v0.1.4](https://are.nci.org.au/g/data/tm70/rb5533/intake_tests/v0.1.4)+13.g869f576.dirty[/metacatalog.csv](https://are.nci.org.au/metacatalog.csv)'
I've run a catalog build with just CMIP5 enabled which built without any issues - I'm just running into this slightly weird error trying to import it.
Yes. Sorry, my bad. It looks like I have mess up with the file
No worries. I'm looking into it more closely now - probably something we're going to want to have some guard rails against.
I've restored the file to default to v0.1.3 for now.
@charles-turner-1 I think you might have to hack the code to open your existing catalog (or, better yet, put it into ~/.access_nri_intake_catalog/
) - the code only knows to look for a catalog to open at either CATALOG_LOCATION = "/g/data/xp65/public/apps/access-nri-intake-catalog/catalog.yaml"
or USER_CATALOG_LOCATION = Path.home() / ".access_nri_intake_catalog/catalog.yaml"
.
We should probably look at adding a path
option to intake.cat.access_nri
to see if we can make that more flexible.
Sorry @marc-white, just saw your comment - the default catalog was pointing at a nonexistent file. My catalog in .access_nri_intake_catalog/catalog.yaml
worked just fine.
I've approved but I think before merging we want to update docs?
Yes, a docs update is definitely required. I can get started on that today.
I've added the necessary documentation, take a peek...
Other than the two minor typos, documentation changes look correct to me.
I updated
cli.py::build
if update: ...
clause on my local machine to make it a bit more readable to check the docs - are you happy for me to push the changes to the branch?
Yep push the changes
@marc-white Can you confirm the shorthand variable names I used in 5c80be3 aren't misleading? Specifically driver_new
, etc..
Other than that I think this is good to go.
@marc-white Can you confirm the shorthand variable names I used in 5c80be3 aren't misleading? Specifically
driver_new
, etc..
Looks reasonable to me!
I'm going to get onto Gadi now and create symlinks for the existing catalog versions, so they get picked up the first time we generate a 'new-style' catalog.
Closes #191 .
This PR significantly alters the way the catalog generated by
access-nri-intake-catalog
is handled, stored, and versioned.Catalog storage location
The live catalog will now be stored on
gdata/xp65
(access_nri_intake.CATALOG_LOCATION
), rather than being shipped with the package. This is because all of the catalog data lives on Gadi anyway, so there doesn't seem to be much point to being able to see the catalog definition YAML, but not be able to access any data.However, to support power users and local developers, there is the option to place a
catalog.yaml
file at a defined location in the user's home directory (access_nri_intake.USER_CATALOG_LOCATION
). This catalog will automatically take precedence over the 'live' catalog on Gadi, if it exists. I thought about adding utility functions that would allow users to create this directory, put acatalog.yaml
in it, etc.; however, given that it's a power user move, I decided it's easier and safer to have the user do that themselves manually.Versioning
Catalog versions will now be date-based, e.g., a catalog built today will be, by default,
v2024-11-07
. Attempts to set a version number that doesn't conform to this pattern will raise an exception.catalog.yaml
will now contain amin
andmax
version. This is to cover the possibility that the catalog structure may change, and a particular version ofcatalog.yaml
may not be compatible with certain catalog versions. I've confirmed that doing the usual<
and>
operations on our version strings has the expected output.Under this versioning schema, there isn't really a need to have a symlinked
latest
version of the catalog (and also because we can update the live catalog onxp65
at will, without doing a code release).Building the catalog
Because
catalog.yaml
will be placed onxp65
now, there is no need to make a new code release for every catalog build.As mentioned above, a catalog built today will take today's date as the default version, although the user can override it.
The build process is now intelligent to the presence of older versions of
catalog.yaml
, and to directories that look like older catalogs:catalog.yaml
exists, one will be created. If the data directory contains folders that look like catalog versions (i.e. vYYYY-MM-DD), then the code will use those to construct themin
andmax
version boundaries (i.e., it will assume that the newcatalog.yaml
is good for describing those existing sources).catalog.yaml
exists, and there is no structural change to the catalog, then the existingcatalog.yaml
will be updated with:catalog.yaml
).catalog.yaml
will be created, with min version = max version = current/new version. The oldcatalog.yaml
will be moved aside tocatalog-<old min version>-<old max version>.yaml
. These catalogs are nominally not accessible, unless the user hacks theaccess_nri_intake.CATALOG_LOCATION
variable. For now, given how infrequently we'll be making the sort of changes that will trigger this scenario, I'm fine with that.Documentation
To be updated once we're happy with the core structure of this update. The updates required will be:
catalog.yaml
.Testing
All of the above should have at least one unit test.