MHKiT-Software / MHKiT-Python

MHKiT-Python provides the marine renewable energy (MRE) community tools for data processing, visualization, quality control, resource assessment, and device performance.
https://mhkit-software.github.io/MHKiT/
BSD 3-Clause "New" or "Revised" License
47 stars 45 forks source link

NetCDF4 #181

Closed ssolson closed 1 year ago

ssolson commented 1 year ago

This PR enables MHKiT to work with the latest version of NetCDF4 (#177).

To fully pass the test on python 3.7, 3.8, 3.9 we use a conda environment (windows and py 3.7 were the only version combinations that could not find the hdf5 path). I have added separate test for pip versions 3.8, 3.9. This expands our tests to check for behavior when users use both pip and conda. This addition is in line with the testing consultation on MHKiT summarized in #183.

Separately on netcdf 1.6.0 there is a new compression kwarg. For Dolfyn I have removed the compression key on the io test. This test will be fixed in #186.

ssolson commented 1 year ago

Hey James I have been looking into this more and it appears that it is actually the Dolfyn xarray .to_netcdf method which is breaking the UNIX tests for MHKiT which you can checkout on #181. Based on our last conversation/ review of Dolfyn is this PR ready to go?

@ssolson Apparently recreating the test file fixes that issue #181

Originally posted by @jmcvey3 in https://github.com/MHKiT-Software/MHKiT-Python/issues/169#issuecomment-1194779336

So @jmcvey3 here you are saying we need to recreate the netcdf files which are being read in? The UNIX tests here are failing on a `to_netcdf`` call that dolfyn is making using xarray.

see here: https://github.com/MHKiT-Software/MHKiT-Python/runs/7504333747?check_suite_focus=true

======================================================================
ERROR: test_save (mhkit.tests.dolfyn.test_read_io.io_testcase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/MHKiT-Python/MHKiT-Python/mhkit/tests/dolfyn/test_read_io.py", line 19, in test_save
    save_netcdf(ds, 'test_save', compression=True)
  File "/home/runner/work/MHKiT-Python/MHKiT-Python/mhkit/tests/dolfyn/base.py", line 54, in save_netcdf
    io.save(data, rfnm(name), *args, **kwargs)
  File "/home/runner/work/MHKiT-Python/MHKiT-Python/mhkit/dolfyn/io/api.py", line 145, in save
    ds.to_netcdf(filename, format=format, engine=engine, **kwargs)
  File "/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/xarray/core/dataset.py", line 1912, in to_netcdf
    invalid_netcdf=invalid_netcdf,
  File "/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/xarray/backends/api.py", line 1073, in to_netcdf
    dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
  File "/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/xarray/backends/api.py", line 1119, in dump_to_store
    store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
  File "/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/xarray/backends/common.py", line [266](https://github.com/MHKiT-Software/MHKiT-Python/runs/7504333747?check_suite_focus=true#step:6:267), in store
    variables, check_encoding_set, writer, unlimited_dims=unlimited_dims
  File "/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/xarray/backends/common.py", line 304, in set_variables
    name, v, check, unlimited_dims=unlimited_dims
  File "/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 497, in prepare_variable
    fill_value=fill_value,
  File "src/netCDF4/_netCDF4.pyx", line 2838, in netCDF4._netCDF4.Dataset.createVariable
  File "src/netCDF4/_netCDF4.pyx", line 4003, in netCDF4._netCDF4.Variable.__init__
  File "src/netCDF4/_netCDF4.pyx", line 1965, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: Filter error: bad id or parameters or duplicate filter
jmcvey3 commented 1 year ago

Hey James I have been looking into this more and it appears that it is actually the Dolfyn xarray .to_netcdf method which is breaking the UNIX tests for MHKiT which you can checkout on #181. Based on our last conversation/ review of Dolfyn is this PR ready to go?

@ssolson Apparently recreating the test file fixes that issue #181

Originally posted by @jmcvey3 in #169 (comment)

So @jmcvey3 here you are saying we need to recreate the netcdf files which are being read in? The UNIX tests here are failing on a `to_netcdf`` call that dolfyn is making using xarray.

see here: https://github.com/MHKiT-Software/MHKiT-Python/runs/7504333747?check_suite_focus=true

======================================================================
ERROR: test_save (mhkit.tests.dolfyn.test_read_io.io_testcase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/MHKiT-Python/MHKiT-Python/mhkit/tests/dolfyn/test_read_io.py", line 19, in test_save
    save_netcdf(ds, 'test_save', compression=True)
  File "/home/runner/work/MHKiT-Python/MHKiT-Python/mhkit/tests/dolfyn/base.py", line 54, in save_netcdf
    io.save(data, rfnm(name), *args, **kwargs)
  File "/home/runner/work/MHKiT-Python/MHKiT-Python/mhkit/dolfyn/io/api.py", line 145, in save
    ds.to_netcdf(filename, format=format, engine=engine, **kwargs)
  File "/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/xarray/core/dataset.py", line 1912, in to_netcdf
    invalid_netcdf=invalid_netcdf,
  File "/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/xarray/backends/api.py", line 1073, in to_netcdf
    dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
  File "/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/xarray/backends/api.py", line 1119, in dump_to_store
    store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
  File "/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/xarray/backends/common.py", line [266](https://github.com/MHKiT-Software/MHKiT-Python/runs/7504333747?check_suite_focus=true#step:6:267), in store
    variables, check_encoding_set, writer, unlimited_dims=unlimited_dims
  File "/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/xarray/backends/common.py", line 304, in set_variables
    name, v, check, unlimited_dims=unlimited_dims
  File "/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 497, in prepare_variable
    fill_value=fill_value,
  File "src/netCDF4/_netCDF4.pyx", line 2838, in netCDF4._netCDF4.Dataset.createVariable
  File "src/netCDF4/_netCDF4.pyx", line 4003, in netCDF4._netCDF4.Variable.__init__
  File "src/netCDF4/_netCDF4.pyx", line 1965, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: Filter error: bad id or parameters or duplicate filter

@ssolson Yes, rebuilding and reuploading that test file fixed it on my branch. If it's still an issue let me know.

ssolson commented 1 year ago

@jmcvey3 I am almost certain what fixed it is you merged my changes to the master branch which required the previous version of netCDF. E.g. if your requirements file says netCDF4<=1.5.8 then that is what is going on.

I discussed this short-term fix in Issue #177 which this PR is attempting to close.

Let me know if this is not the case.

jmcvey3 commented 1 year ago

@jmcvey3 I am almost certain what fixed it is you merged my changes to the master branch which required the previous version of netCDF. E.g. if your requirements file says netCDF4<=1.5.8 then that is what is going on.

I discussed this short-term fix in Issue #177 which this PR is attempting to close.

Let me know if this is not the case.

Oh I see. I thought I'd already updated my branch, but apparently not.

ssolson commented 1 year ago

Okay cool. Yeah lmk if you have any ideas about the dolfyn to_netcdf issue above and I can take a stab at it.

Right now I am focused on the Windows issue with Py 3.7 (Windows 3.8, 3.9 both work currently)... getting closer to just dropping 3.6 support here 😄 and moving to 3.10.

ssolson commented 1 year ago

Removing compression from netcdf save in dolfyn fixed the Linux builds see commit https://github.com/MHKiT-Software/MHKiT-Python/pull/181/commits/7f05e9fddb94bdc3a6db178b13a745c115596906.

@jmcvey3 do you think you could take a look at what is going on with the new version of netcdf and compression?

ssolson commented 1 year ago

Without compressing NetCDF data in Dolfyn all tests pass except Python 3.7 on windows.

ssolson commented 1 year ago

Using a conda environment we can get them all to pass. Testing this way would require us to change our guidance around how we ask users to install mhkit. E.g. we want to test the same way we recommend. Currently we recommend pip for everything. This would be moving towards a conda centric approach... part of our testing suggestions is test both pip and conda... and get mhkit on conda install.

jmcvey3 commented 1 year ago

Ah. It's in the 1.6.0 changelog. dolfyn uses "zlib=True" if compression is set to true, so that needs to be updated

 * add 'compression' kwarg to createVariable to enable new compression
   functionality in netcdf-c 4.9.0.  'None','zlib','szip','zstd','bzip2'
   'blosc_lz','blosc_lz4','blosc_lz4hc','blosc_zlib' and 'blosc_zstd'
   are currently supported. 'blosc_shuffle',
   'szip_mask' and 'szip_pixels_per_block' kwargs also added.
   compression='zlib' is equivalent to (the now deprecated) zlib=True.
   If the environment variable NETCDF_PLUGIN_DIR is set to point to the
   directory with the compression plugin lib__nc* files, then the compression plugins will
   be installed within the package and be automatically available (the binary
   wheels have this).  Otherwise, the environment variable HDF5_PLUGIN_PATH
   needs to be set at runtime  to point to plugins in order to use the new compression
   options.
ssolson commented 1 year ago

Thanks, James my question to you was specific to the compression. I was working and have fixed (albeit with the discussion I posted above) the mac 2.7 and windows py3.7 issues by using a conda environment. Going to sleep on what exactly I want to do and maybe try a couple more things before making a decision.

I was going to try adding compression back in tomorrow. I will follow up then.

jmcvey3 commented 1 year ago

Using a conda environment we can get them all to pass. Testing this way would require us to change our guidance around how we ask users to install mhkit. E.g. we want to test the same way we recommend. Currently we recommend pip for everything. This would be moving towards a conda centric approach... part of our testing suggestions is test both pip and conda... and get mhkit on conda install.

Possibly? I've found it's just much easier to use conda in github actions than venv.

ssolson commented 1 year ago

Ah. It's in the 1.6.0 changelog. dolfyn uses "zlib=True" if compression is set to true, so that needs to be updated

 * add 'compression' kwarg to createVariable to enable new compression
   functionality in netcdf-c 4.9.0.  'None','zlib','szip','zstd','bzip2'
   'blosc_lz','blosc_lz4','blosc_lz4hc','blosc_zlib' and 'blosc_zstd'
   are currently supported. 'blosc_shuffle',
   'szip_mask' and 'szip_pixels_per_block' kwargs also added.
   compression='zlib' is equivalent to (the now deprecated) zlib=True.
   If the environment variable NETCDF_PLUGIN_DIR is set to point to the
   directory with the compression plugin lib__nc* files, then the compression plugins will
   be installed within the package and be automatically available (the binary
   wheels have this).  Otherwise, the environment variable HDF5_PLUGIN_PATH
   needs to be set at runtime  to point to plugins in order to use the new compression
   options.

Awesome! Super helpful James. I can do keyword search for zlib in Dolfyn or if you know where that is set and point me to it that would be helpful.

ssolson commented 1 year ago

Using a conda environment we can get them all to pass. Testing this way would require us to change our guidance around how we ask users to install mhkit. E.g. we want to test the same way we recommend. Currently we recommend pip for everything. This would be moving towards a conda centric approach... part of our testing suggestions is test both pip and conda... and get mhkit on conda install.

Possibly? I've found it's just much easier to use conda in github actions than venv.

So this was a suggestion from the testing team. Essentially the testing mindset should be that you test what you tell your users to do. Which is what we do currently but using the conda env would deviate from our doementation as it stands. But yes working in the GitHub actions environment has been a bit of a headache and using a conda env ultimately fixed all the problems so it is certainly easier.

jmcvey3 commented 1 year ago

Did we not merge (or want to merge) the rest of the dolfyn source code? There's only an older version of the IO code in mhkit right now.

ssolson commented 1 year ago

Did we not merge (or want to merge) the rest of the dolfyn source code? There's only an older version of the IO code in mhkit right now.

I think you are referring to the master branch? The last Dolfyn merge went into the Development branch. Which I will state again that I disagree with this setup and that we should just push things into the master branch and delete the Development branch bc there is a whole tab for releases (https://github.com/MHKiT-Software/MHKiT-Python/releases) or we could add a release branch if we wanted to have this messier 2 branch system that we are currently on.

Let me know if I misunderstood your question.

jmcvey3 commented 1 year ago

Oh! Right. Yes so long as each branch doesn't interfere with another, but if there's unavoidable co-development on multiple branches, I could see the "Develop" branch being a good idea.

jmcvey3 commented 1 year ago

I created a new pull request for Develop off the same dolfyn branch with this fix and several others in it.

ssolson commented 1 year ago

@akeeste would you mind taking a look/ providing review/approval on these testing changes?

I have summarized the changes in the intro box.

akeeste commented 1 year ago

@ssolson

I've been keeping up with this and the Dolfyn thread at a high level. Can you confirm that I'm understanding correctly:

The netcdf and dolfyn compression changes seem straightforward now that they're figured out. I can look into windows + conda + python 3.7, but am probably less familiar with this than you are.

ssolson commented 1 year ago

@ssolson

I've been keeping up with this and the Dolfyn thread at a high level. Can you confirm that I'm understanding correctly:

  • pip install is still being tested, but now as a different job in Actions than the conda install job.

Pip install is still being tested as before but I have removed the py 3.7 test because it fails on windows. I believe this is a github actions environment issue not an issue with our codebase. I plan to revisit testing soon to address other parts of #183 and will continue to work on this issue. At this point, I am approaching a month on this issue and I need to focus on other parts of MHKiT.

  • You added conda install tests in anticipation of openly supporting/recommending that to users going forward Using a conda environment we are running all combinations of 3.7,8,9 and mac, windows, and linux.

  • [Windows + conda install + python 3.7] is the only conda installation that doesn't currently work

No. It would be [Windows + pip install + python 3.7], But I believe its a github actions environment issue.

The netcdf and dolfyn compression changes seem straightforward now that they're figured out. I can look into windows + conda + python 3.7, but am probably less familiar with this than you are.

I'm really just looking for a review. My suggested plan of action is to get these actions into the Develop branch, and add a comment to the testing issue #183 referencing the py3.7/windows issue. Integrating these changes will allow the code base to work with the newest version of Netcdf 1.6.0 and I can focus on the number of other MHKiT PRs which are in need of review.

akeeste commented 1 year ago

@ssolson Thanks for clarifying. I can't tag myself as reviewer, but have completed my review and approve of the changes. let's merge.