Open k-a-webb opened 3 months ago
Noting the my proposed solution fails when keys are intermixed with text... e.g., Tier{tier}
:
This works though:
def _path2facets(path: Path, drs: str) -> dict[str, str]:
"""Extract facets from a path using a DRS like '{facet1}/{facet2}'."""
keys = []
for key in re.findall(r'{(.*?)}[^-]', f'{drs} '):
key = key.split('.')[0] # Remove trailing .lower and .upper
keys.append(key)
key_indices = {}
for key in keys:
for i,part in enumerate( drs.split('/')[::-1] ):
if "}-{" in part and "}-{" not in key: continue
if '{{{}}}'.format(key) in part:
key_indices[key] = -(i+1)
# check that all keys found
assert all([ key in key_indices for key in keys ]), "Error: Missing keys"
dirpath = path.parents[0]
facets = {}
for key in key_indices:
index = key_indices[key]
if '{' not in dirpath.parts[index]: # deal with multi-key parts below
facets[key] = dirpath.parts[index]
if len(facets) != len(keys):
# Extract hyphen separated facet: {facet1}-{facet2},
# where facet1 is already known.
for idx, key in enumerate(keys):
if key not in facets:
facet1, facet2 = key.split("}-{")
facets[facet2] = values[idx].replace(f'{facets[facet1]}-', '')
return facets
hi @k-a-webb and many thanks for raising this! Correct me if I'm wrong, but you are trying to map
{user}/{subdir}/{runid}/data
to a path like
/space/hall5/sitestore/eccc/crd/ccrn/users/rvs001/canesm_runs/sv-canam-001/data
but that'll never work since {user}
can not be /space/hall5/sitestore/eccc/crd/ccrn/users/rvs001
- the number of placeholders needs to match the number of elements delimited by the path delimiter /
Hi!
the number of placeholders needs to match the number of elements delimited by the path delimiter /
I can see how this is necessary for _path2facets
after investigating the code, but I did not find instructions in the documentation that it was a requirement when setting up config-developer.yml
-- although I might missed it!
The code I suggested does not require every element in the path to be a placeholder. This is helpful when there is a set organization to your data (which in this case includes many fixed subdirectory names), but you want to limit the number of parameters required to define a dataset.
@k-a-webb You're welcome to make a pull request to improve this, it looks like you already have some good ideas.
You could probably set rootpath
to /space/hall5/sitestore/eccc/crd/ccrn/users
to avoid the issue mentioned in https://github.com/ESMValGroup/ESMValCore/issues/2502#issuecomment-2269379129.
When
drs
is a path with both configurable (e.g.,{user}
) and not configurable (e.g.,nc_output
) keys,_path2facets
indexing of keys and paths fails.Example:
Gives the following error:
It incorrectly pairs keys to facets:
and then fails when the indexing does not match.
A solution is to record the index of
path.split
: