Ouranosinc / xscen

A climate change scenario-building analysis framework.
https://xscen.readthedocs.io/
Apache License 2.0
15 stars 2 forks source link

Refactor catalog utils (again) #205

Closed aulemahal closed 1 year ago

aulemahal commented 1 year ago

Pull Request Checklist:

What kind of change does this PR introduce?

Does this PR introduce a breaking change?

Yes.

Other information:

Example for the new patterns:

OLD: {processing_level}/{mip_era}/{activity}/{domain}/{institution}/{source}/{experiment}/{member}/{frequency}/{_variable}/{?var}_{?freq}_{?src}_{?exp}_{?memb}_{*}_{date_start}-{date_end}.nc"

NEW : "{processing_level}/{mip_era}/{activity}/{domain}/{institution}/{source}/{experiment}/{member}/{frequency}/{variable:_}/{*}_{DATES}.nc"

The "variable" field accepts underscores, so it uses the "_" format specifier. No more need to specify each part of the filename if only the last one is needed. Here the "DATES" special field can catch single dates or bounds.

TODO:

Code complexity

Lol. I thought this PR would simplify the parse_directory system.

The problem is the MRCC5. Or at least, it is the enormous size of this database and its scattering on slow-to-read disks. It is the only reason for all these complexities:

I wanted to make clear that the complexity of this code is NOT only because I like to optimize things. Last time I ran the MRCC5 catalog creation (with this code), it took 7 hours. Just imagine without the optimizations. (And this while missing a full disk).

RondeauG commented 1 year ago

If you could use this PR to also address #152, that would be great! I wrote the issue, but it originally came from @mccrayc, so you can check with him for the details.

mccrayc commented 1 year ago

Looks good to me! The new patterns seem much more intuitive and clean.

aulemahal commented 1 year ago

Last commit made a few changes. I tested the new code with ouranos_data_catalogs and it made a few bugs appear:

And I guess I now need to add tests!

aulemahal commented 1 year ago

+12% coverage :muscle: