NCAS-CMS / cfdm

A Python reference implementation of the CF data model
http://ncas-cms.github.io/cfdm
MIT License
28 stars 11 forks source link

Allow data to be omitted from netCDF files during `cfdm.write` #222

Closed davidhassell closed 2 years ago

davidhassell commented 2 years ago

Fixes #221

davidhassell commented 2 years ago

OK - I've finished fiddling - good to review, now, thanks.

codecov[bot] commented 2 years ago

Codecov Report

Merging #222 (e298008) into master (46cf194) will increase coverage by 0.01%. The diff coverage is 100.00%.

:exclamation: Current head e298008 differs from pull request most recent head f580ea5. Consider uploading reports for the commit f580ea5 to get more accurate results

@@            Coverage Diff             @@
##           master     #222      +/-   ##
==========================================
+ Coverage   87.66%   87.67%   +0.01%     
==========================================
  Files         123      123              
  Lines       12631    12641      +10     
==========================================
+ Hits        11072    11082      +10     
  Misses       1559     1559              
Flag Coverage Δ
unittests 87.67% <100.00%> (+0.01%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
cfdm/read_write/write.py 100.00% <ø> (ø)
cfdm/cfdmimplementation.py 83.05% <100.00%> (+0.07%) :arrow_up:
cfdm/read_write/netcdf/netcdfwrite.py 86.96% <100.00%> (+0.08%) :arrow_up:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

davidhassell commented 2 years ago

where I am also wondering why the first few Data points remain for the bounds and interior ring, i.e. why they don't change and show as [[--, ..., --]] like the other data that gets omitted?

https://github.com/NCAS-CMS/cfdm/pull/222/commits/e0d61cb40e13c05cba91e701455b45b830c903d9 passes on the omission to the interior ring

davidhassell commented 2 years ago

Data(time(1)) = [0]?

This one's an unrelated bug. If we look at the actual array we get:

>>> t = g.construct('time')
>>> t.data
<Data(3): [--, 0, --] gregorian>   # Wrong
>>> t.data.array 
masked_array(data=[--, --, --],    # Right
             mask=[ True,  True,  True],
       fill_value=1e+20,
            dtype=float64)

So the underlying data is correct, but the str (which is called by the repr representation is wrong. Let's raise another issue for that.

davidhassell commented 2 years ago

example field 7: the _FillValue = -1073741824.0 changes slightly to _FillValue = -1073741800.0, is that expected and/or harmless?

Unexpected, but harmless for now! It happens through a write/read cycle at v1.10.0.0, too, so isn't related to omit_data. Might be a rounding concern (I also looked at some double precision cases). One for another issue, too.

davidhassell commented 2 years ago

example field 6: the Auxiliary coordinate: ncvar%z construct gets re-labelled as Auxiliary coordinate: altitude,

Again, unrelated (thankfully). If you inspect f and g after import cfdm; f = cfdm.example_field(6); cfdm.write(f, 'delme1.nc'); cfdm.write(f, 'delme2.nc', omit_data='all'); f = cfdm.read('delme1.nc')[0]; g = cfdm.read('delme2.nc')[0], where f is read from disk, rather than created ab initio, all is fine.

I'll have a look in the example field definition to see if it's obvious why "altitude" doesn't appear in the first place ...

sadielbartholomew commented 2 years ago

Thanks David for drilling down to investigate all of the facets. Always good to catch a bug even if it isn't immediately related :smile:

So as far as I can tell from your recent comments, https://github.com/NCAS-CMS/cfdm/pull/222#issuecomment-1295175407 is the only issue directly related to this PR, and you have put in a fix, is that right? In which case I just need to re-review with the new commit and then we're good to go?

davidhassell commented 2 years ago

example field 6: the Auxiliary coordinate: ncvar%z construct gets re-labelled as Auxiliary coordinate: altitude,

This turns out to be a feature! It's to do with the unusual case where a coordinate construct has bounds but no data. In this case we shouldn't be assigning an ncvar to the coordinates (which is what example 6 was doing). This commit https://github.com/NCAS-CMS/cfdm/pull/222/commits/3ae74fe0916df87256843ae1ea2017f9b2b54fd5 makes things consistent.

davidhassell commented 2 years ago

Can we also do a quick dump on the re-read field g

I've added to the test:

        # Check that a dump works
        g.dump(display=False)

Is that what you meant?

sadielbartholomew commented 2 years ago

Is that what you meant?

Yes, precisely, just test that it doesn't fail miserably due to some facet of missing data.

davidhassell commented 2 years ago

Great - are we good to go on this? (notwithstanding the two new issues that have been spawned :))

davidhassell commented 2 years ago

(sorry - I missed https://github.com/NCAS-CMS/cfdm/pull/222#issuecomment-1295236660, but I think we've ended up in the same place!)

sadielbartholomew commented 2 years ago

(sorry - I missed https://github.com/NCAS-CMS/cfdm/pull/222#issuecomment-1295236660, but I think we've ended up in the same place!)

No worries! I also missed your comments afterwards so my apologies too.

I'm just taking a final look and will let you know in a moment.

davidhassell commented 2 years ago

Thanks, Sadie - Happy for you to raise new issues :) (but just ping me otherwise)