[ ] This PR addresses an already opened issue (for bug fixes / features)
This PR fixes #xyz
[x] (If applicable) Documentation has been added / updated (for bug fixes / features).
[x] (If applicable) Tests have been added.
[x] This PR does not seem to break the templates.
[x] HISTORY.rst has been updated (with summary of main changes).
[x] Link to issue (:issue:number) and pull request (:pull:number) has been added.
What kind of change does this PR introduce?
Two different things, but that I both realized when updating the catalogs.
Stricter build_path
When I migrated from miranda, I relaxed the "structure" code because it felt too restricting and I wanted to simplify the logic. However, this wasn't a good idea. When moving the ESPO-R5-E5L indicators, I forgot to include the "experiment" field somewhere and used build_path to copy the files. Result : one scenario overwrote the other, I lost half of the data.
This PR changes the things a bit, the main change being : All facets, except those marked optional, are necessary. build_path will FAIL if any is missing.
And:
New way to specific a folder level in the schema : "()" (with parenthesis). This facet is marked as optional and if it is missing from the data, the level is skipped.
Removal of the "option: " structure from the folder schema. This was only used to put "[bias_adjust_project version]" if the former was non-null. Instead, the schemas are duplicated : one for the "raw" case and one for the "biasadjusted" case. (And similarly for derived data).
The previous point allowed me to rewrite _get_needed_fields without the funcky magic needed before.
Removal of the "strict" keyword. It is always strict. The previous strict=True was overly strict because of caveats of _get_needed_fields. Those are now fixed and strict=False shouldn't be needed.
Some syntax in the yml file changed.
Passing a dataframe/catalog will now also add a "new_path_type" column to the output, so one can make sure all entries have been constructed from the same schema.
Better end_of_period
When I updated pandas to 2, I modified date_parser and it changed how the "end_of_period" was handled.
date_parser('2020', end_of_period=True)
# Initial xscen: "2020-12-31 23:00"
# Current xscen : "2020-12-31 00:00:00"
# This PR: "2020-12-31 23:59:59"
Thus, when searching for a coverage, the error due to the hour of the period end will be reduced.
Does this PR introduce a breaking change?
build_path is now always strict.
Other information:
Do you agree?
I could re-implement strict=False if needed. It would mark all fields as optional on-the-fly. The default would stay "True".
Pull Request Checklist:
number
) and pull request (:pull:number
) has been added.What kind of change does this PR introduce?
Two different things, but that I both realized when updating the catalogs.
Stricter build_path
When I migrated from miranda, I relaxed the "structure" code because it felt too restricting and I wanted to simplify the logic. However, this wasn't a good idea. When moving the ESPO-R5-E5L indicators, I forgot to include the "experiment" field somewhere and used build_path to copy the files. Result : one scenario overwrote the other, I lost half of the data.
This PR changes the things a bit, the main change being : All facets, except those marked optional, are necessary.
build_path
will FAIL if any is missing.And:
_get_needed_fields
without the funcky magic needed before.strict=True
was overly strict because of caveats of_get_needed_fields
. Those are now fixed andstrict=False
shouldn't be needed.Better end_of_period
When I updated pandas to 2, I modified
date_parser
and it changed how the "end_of_period" was handled.Thus, when searching for a coverage, the error due to the hour of the period end will be reduced.
Does this PR introduce a breaking change?
build_path
is now always strict.Other information:
Do you agree?
I could re-implement
strict=False
if needed. It would mark all fields as optional on-the-fly. The default would stay "True".