Open epag opened 3 months ago
Original Redmine Comment Author Name: James (James) Original Date: 2021-10-05T15:35:57Z
Must not reduce compatibility when compared w/ netcdf2. Must still work in a recent gdal version and hence ots tools like qgis.
Should probably target cf 1.8.
Need to add wkt geometries for one. Would allow us to represent feature groups properly, as well as other more complex geometries. Would be nice to have one blob, not many. Might be nice to add all statistics, but this is probably not straightforward for an array-formatted blob, so nice-to-have, not essential.
Original Redmine Comment Author Name: James (James) Original Date: 2021-10-05T15:38:43Z
Anyway, add yer wishlist here.
One of the nice things w/ netcdf when compared to csv2 is that it's a lot less verbose, so I think it has an ongoing user base. It might make more sense to use csv2 in data-frame-shaped applications, but netcdf is a nicer format in many ways for geospatial applications.
Perhaps, one day, we'll have one format that rules them all (edit: user facing, I mean, we already have our canonical format), but I doubt it (because there is a proliferation of geospatial and time-series formats more generally, this is not a wres thing). Perhaps netcdf3 could be a further step along the way, though.
Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2021-10-05T16:00:51Z
Those in order. In other words, if there is a conflict between CF-conventions and interop, interop takes priority.
Edit: I reversed the order of modeling and CF conventions, split "less cruft" into its own item.
Original Redmine Comment Author Name: James (James) Original Date: 2021-10-05T16:22:54Z
Jesse wrote:
- Single blob
- Geographic interoperability with recent GDAL (and therefore other tools)
- Accurate and precise modeling
- Recent-ish CF-conventions adherence
- Less cruft
- More metadata
Those in order. In other words, if there is a conflict between CF-conventions and interop, interop takes priority.
Edit: I reversed the order of modeling and CF conventions, split "less cruft" into its own item.
Sounds good to me. The reason for data standards/convention is, in any case, to increase interop, so if the cf convention fails in some way, always side on improved interop for our user base.
Original Redmine Comment Author Name: James (James) Original Date: 2021-10-05T16:24:20Z
( #97121 in terms of item 6, more metadata. )
Original Redmine Comment Author Name: James (James) Original Date: 2021-10-05T16:26:09Z
Another thing that would be really nice to fix (but might be hard, I forget - edit: so, I'm not sure if this is bound up in format and hence within scope or tools and hence out-of-scope) is the delayed structure identification. It is a massive pita for our pipeline to bring forward the structure identification before statistics write time (versus incrementing a structure as statistics arrive). edit: that is to say, it makes netcdf a special snowflake among statistics formats, which is never good.
Original Redmine Comment Author Name: James (James) Original Date: 2021-10-06T11:43:23Z
Not a feature, but:
We can use an in-memory filesystem for this. There are examples for other format writers, like csv2. Essentially, write the file to an in-memory filesystem, then read some or all of it and make assertions against expectations. Would be nice to not rely on reading (esp. for netcdf which cannot be done with a jdk one-liner like csv2), but there is no way around that as a means of establishing what was written.
Original Redmine Comment Author Name: James (James) Original Date: 2022-04-26T11:03:01Z
Variable naming is another area for improvement. In netcdf/netcdf2, we qualify the variable names with metadata, which leads to friction when adding newly qualified slices of statistics. The attributes of a variable should fully qualify the statistics within it. A more general naming convention should be adopted for the variables, avoiding threshold and other information and perhaps even the metric name, although this may be helpful for a human user who is trying to visually filter slices in a GIS or some other visualization tool and find the one they want.
Original Redmine Comment Author Name: James (James) Original Date: 2022-04-26T11:07:23Z
edit: oops, wrong thread, ignore.
-On building, there's a small number of unit test failures to deal with...-
-For the system tests, scenario003 will fail on assertions, since the graphics titles are now additionally qualified with the ensemble average type, where applicable, and scenario003 is an ensemble evaluation with all valid metrics and graphics benchmarks. I don't anticipate other failures.-
Author Name: James (James) Original Redmine Issue: 97121, https://vlab.noaa.gov/redmine/issues/97121 Original Date: 2021-10-05
Given a @netcdf2@ format that has some weaknesses (because it attempted to straddle various competing objectives at the time) When I consider how to improve it Then I want to consider a @netcdf3@ format
Specific enhancements to be listed.
Redmine related issue(s): 103076