ACCESS-NRI / access-nri-intake-catalog

Tools and configuration info used to manage ACCESS-NRI's intake catalogue
https://access-nri-intake-catalog.rtfd.io
Apache License 2.0
8 stars 1 forks source link

Add ACCESS-OM3 builder #152

Closed dougiesquire closed 7 months ago

dougiesquire commented 7 months ago

This PR adds a builder class for ACCESS-OM3 data. Closes #151

codecov[bot] commented 7 months ago

Codecov Report

Attention: Patch coverage is 84.00000% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 96.64%. Comparing base (c6c4449) to head (1f613a7).

Files Patch % Lines
src/access_nri_intake/source/builders.py 76.47% 4 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #152 +/- ## ========================================== - Coverage 97.20% 96.64% -0.57% ========================================== Files 9 9 Lines 573 596 +23 ========================================== + Hits 557 576 +19 - Misses 16 20 +4 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

dougiesquire commented 7 months ago

Thanks for agreeing to review @anton-seaice. Don't worry about all the new files in tests/data - these are just stripped-back real OM3 output that are used in some of the tests. The tests themselves should make it clear where/how these are used.

anton-seaice commented 7 months ago

Screenshot 2024-03-01 at 1 21 24 pm

In my WW data in the datastore, there is no "start_date" & "end_date" ? Screenshot 2024-03-01 at 1 22 29 pm

dougiesquire commented 7 months ago

In my WW data in the datastore, there is no "start_date" & "end_date" ?

Yeah, the WW3 data contains no time bounds info, so when there is only one time step per file there's no way to determine what the frequency is and it assumes fixed frequency. I could still add strings for the start and end dates in this case? In which case we'd have something like:

frequency start_date end_date
fx 1958-01-02, 00:00:00 1958-01-02, 00:00:00
anton-seaice commented 7 months ago

Yeah, the WW3 data contains no time bounds info, so when there is only one time step per file there's no way to determine what the frequency is and it assumes fixed frequency. I could still add strings for the start and end dates in this case?...

Does it make more sense to add this to the WW3 output?

dougiesquire commented 7 months ago

Does it make more sense to add this to the WW3 output?

It would definitely be better to have this in the output, but that requires some automated postprocessing (unless it can be configured in WW3?). These aren't the first files to be missing time bounds and they won't be the last so some sensible default behaviour is probably still needed

anton-seaice commented 7 months ago

Does it make more sense to add this to the WW3 output?

It would definitely be better to have this in the output, but that requires some automated postprocessing (unless it can be configured in WW3?). These aren't the first files to be missing time bounds and they won't be the last so some sensible default behaviour is probably still needed

It doesn't look like its configurable. A code change shouldn't be too hard though. We will need to dome something for time frequency eventually because at the moment there is no way to know if the output is daily or monthly (or something else), except for looking in nuopc.runconfig

Adding some sort of start and end date handling should improve searching.

dougiesquire commented 7 months ago

Adding some sort of start and end date handling should improve searching.

With https://github.com/ACCESS-NRI/access-nri-intake-catalog/pull/152/commits/1f613a79ec6baced0bfcb34088d6ef58f52e81a2, we now parse the start and end dates from the WW3 files (but they are the same...)

anton-seaice commented 7 months ago

This seems to work now. We've added a different annoyance, which is the mom static file now produces a 'start_date' and 'end_date' in the catalog. But that is due to a quirk of the source file, not this builder :)

anton-seaice commented 7 months ago

I raised https://github.com/COSIMA/access-om3/issues/110 to consider the missing time_bounds in WW output

dougiesquire commented 7 months ago

This seems to work now. We've added a different annoyance, which is the mom static file now produces a 'start_date' and 'end_date' in the catalog. But that is due to a quirk of the source file, not this builder :)

Yeah, there isn't really anything we can do about data that's missing metadata or saved in a strange way. I think adding the start and end date whenever there is a time axis is the most defensible approach, otherwise we might be throwing away potentially useful info

dougiesquire commented 7 months ago

Thanks for a thorough review @anton-seaice