SciTools / iris

A powerful, format-agnostic, and community-driven Python package for analysing and visualising Earth science data
https://scitools-iris.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
635 stars 283 forks source link

Common agreement on loading CF non-compliant NetCDF files #5165

Open trexfeathers opened 1 year ago

trexfeathers commented 1 year ago

Iris needs a public statement on how it handles NetCDF files that deviate from the CF conventions. This will serve multiple benefits:

Writing this statement will involve making some difficult decisions. A working group is tackling this now: @tkknight, @bjlittle, @lbdreyer, @pp-mo, @trexfeathers, @stephenworsley, @ESadek-MO, @scottrobinson02, @HGWright

Factors at play

Items affected

(please edit if you know of others)

trexfeathers commented 1 year ago

Summary from working group conversations

2023-02-02, 2023-02-14, 2023-03-22

Note this issue is not intended as a debate, hence why it is not posted as a discussion. The below conversations took place in real time, with a group deliberately sized to aid decision making.

Outcome - our ideal implementation

When loading NetCDF files, Iris will load all CF-compliant elements. A container of non-compliant variables and attributes will be attached to the Cube(s).

Encourage users:

If this causes you problems, please reach out to us to see if we can collaborate on a solution.

Implementation considerations

Working group summary comments

Discussion topics

Encouraging compliance in the community

Files changing from acceptable to unacceptable

Ease of massaging files to be compliant

User experience (UX)

Iris' place in the world

Ease of software development

Preferred approaches

Determined via voting.

  1. Iris only loads CF compliant parts of file, skipping non-compliant (maybe raises warning?).
  2. Iris allows the user to configure how it will interpret malformed file.
edmundhenley-mo commented 4 months ago

Oooh just discovered this issue via DragonTaming board @trexfeathers.

Sounds like you've got a fair bit of input from working group already; please shout though if useful to have more, as this is a particularly painful area for space weather - and we've got a good amount of requirements (ionosphere and lower) in the iris-o-sphere of traditional geographic lat/lon coords!

More context on why CF non-compliance an issue for space weather Highly interested: space weather is not represented in CF conventions, so data wrangling is a key issue for us. There's a few times where I've consciously decided not to go with iris due to anticipating "ugh, lots of pain handling I/O at boundaries due to data being inherently non-CF-compliant" In retrospect, often this decision was bad: * I've ended up writing (and then having to support!) custom code - e.g. pseudo-geo-aware dataclasses & methods for ionospheric data - which ends up being a poorer version of iris. * I'd have been better served going for the real deal, and biting the boundaries-pain bullet. Self-interestedly v happy to give more input if useful - help you help me!
trexfeathers commented 1 month ago

My personal proposal, after some loose discussion with @bjlittle and @pp-mo:

This should serve to allow loading to continue under as many circumstances as possible, and providing the user with recourse to fix up problem objects post-loading. Should be reasonably simple to scour through the loading code to find likely places for try-except. This feature need not be limited to CF-parsing in NetCDF, although that is presumably the source of most of the problems.

stephenworsley commented 1 month ago

From @SciTools/peloton : consider an option to fail/warn fast.