MHKiT-Software / MHKiT-Python

MHKiT-Python provides the marine renewable energy (MRE) community tools for data processing, visualization, quality control, resource assessment, and device performance.
https://mhkit-software.github.io/MHKiT/
BSD 3-Clause "New" or "Revised" License
47 stars 45 forks source link

Fix pip install tests #194

Closed ssolson closed 1 year ago

ssolson commented 1 year ago

This PR improves the testing structure of MHKit by:

  1. Switching testing from nosetest (deprecated) to pytest
  2. Including coverage for pip and conda installation
  3. Breaking out calls to the hindcast module to its own serialized job with dedicated coverage. This will avoid the issues of overlapping calls that crashed the API in the past. This was achieved via configuration files for both pytest and coverage indicating to the testing module which files should be run and included in the coverage report.

Each of these recommendations was suggested to us in #183.

This PR additionally applies a temporary NetCDF version pin to the previous release to fix issues with pip installation (#177) and allow tests to pass.

An additional issue around bandwidth related NREL's HSDS API occurred while working on this issue. In order to clear out the current backlog of PRs we are closing this PR with the following limitations:

  1. No hindcast tests are being run (these call the NREL HSDS API)
  2. mhkit pip and python 3.7 are not being tested (there is an issue with dolyfn). A conda install is working and being tested.
  3. netCDF version is currently pinned to 1.5.8 (not the most recent)
ssolson commented 1 year ago

lol pandas released a new version last night with breaking changes for us. I am going to pin the pandas version to the previous so we can close our current PRs.

image

ssolson commented 1 year ago

Not necessary at this time but I want to document here the issues which seem to mostly arise around the use of py 3.7. In the future, we will look to deprecate 3.7.

image

ssolson commented 1 year ago

Okay so there is an issue I have been chasing down now regarding the hindcast tests. test_multi_loc is continually returning a 0 array for the data being passed in the test. The error returned from the test: FAILED ::TestWPTOhindcast::test_multi_loc - ValueError: cannot reshape array of size 0 into shape (8748,29,24,1)

This is occurring inside of hypyd in a call that converts bytesToArray:

data = b'', dt = dtype('float32'), shape = (8748, 29, 24, 1)

    def bytesToArray(data, dt, shape):
        nelements = getNumElements(shape)

        if not isVlen(dt):
            # regular numpy from string
            arr = np.frombuffer(data, dtype=dt)
        else:
            arr = np.zeros((nelements,), dtype=dt)
            offset = 0
            for index in range(nelements):
                offset = readElement(data, offset, arr, index, dt)

        if shape is not None:
            if shape == () and dt.shape:
                # special case for scalar array with array sub-type
                arr = arr.reshape(dt.shape)
            else:
>               arr = arr.reshape(shape)
E               ValueError: cannot reshape array of size 0 into shape (8748,29,24,1)

..\..\anaconda3\lib\site-packages\h5pyd\_hl\base.py:569: ValueError

Something has changed in our dependencies and I have not at this time I have not been able to track down what is breaking this test.

ssolson commented 1 year ago

@rpauly18 there is some issue with rex causing the tests to fail.

Specifically: The parameter directional_wave_spectrum when calling get_lat_lon_df on the path: '/nrel/US_wave/virtual_buoy/West_Coast/West_Coast_virtual_buoy_1995.h5'.

E.g.:

from rex import WaveX

year='1995'
region = wave.io.hindcast.region_selection((43.489171,-125.152137))
wave_path = f'/nrel/US_wave/virtual_buoy/'+region+'/'+region+'_virtual_buoy_'+year+'.h5'

with WaveX(wave_path, hsds=True) as rex_waves:
    meta = rex_waves.meta
    time_index = rex_waves.time_index
    data_raw = rex_waves.get_lat_lon_df('directional_wave_spectrum',(43.489171,-125.152137))

I have found that switching the parameter to energy_period will return the correct data as expected. The above example is the simplest way to reproduce the error from our testing suite. The exact error is found in test_multi_loc when it specifically calls the hindcast module function request_wpto_directional_spectrum.

request_wpto_directional_spectrum uses additional kwargs in the call to WaveX but the error returned by the code above is the same.

Since you wrote this module I wanted to reach out to you to see if you had any idea to try anything I had not yet. I will go ahead and reach out to the rex team using my example above to see if they can offer any insight.

ssolson commented 1 year ago

https://github.com/NREL/rex/issues/133#issue-1421112841

akeeste commented 1 year ago

@ssolson @rpauly18 Our PR backlog continues to grow, so I have a couple suggestions for an immediate path forward:

Any thoughts on this workflow? #187, #193, #196, #197 are all reviewed and more or less ready to merge, pending some issues with tests. I'd be great to merge these PRs sometime next week using one of the above options

ssolson commented 1 year ago

It would be very easy to comment out the hindcast tests the way I have set it up. The reason I had not suggested this is primarily because it means we are not testing our whole package for breaking changes from new PRs. I think your plan works for #193, #196, and #197 but since #187 uses rex it needs to be added to the serialized portion of the tests and I would not recommend it for inclusion until after getting the current issues worked out.

akeeste commented 1 year ago

That makes sense about #187. I agree that we will not be fully checking PRs as they come in, but any breaking updates would be isolated to the Develop branch. This way we can keep up with development and once the tests are fixed, they should identify any outstanding issues. Can you give an update on the status of this PR for the other versioning/platform issues aside from rex?

ssolson commented 1 year ago

To get your plan working for this PR I need to

  1. Comment out the hindcast job in the main.yml file
  2. Add back in all OS and Py versions to the other tests.

I will go ahead and do that now since it is so easy.

akeeste commented 1 year ago

@ssolson Thanks, when you're ready can you mark the PR ready for review? I'll find time to review this and hopefully merge a couple other PRs before I'm out of office.

ssolson commented 1 year ago

Guys I have completed the follow up with Grant with is final comments and recommendations here https://github.com/NREL/rex/issues/133#issuecomment-1303685718.

...here are some observations based on what we've tried:

  1. Neither of us seem to be able to pull the full data volume from that dataset
  2. The errors are being raised by h5pyd, not rex, and appear to be saying "no data has been returned"
  3. Smaller datasets or a slice of the spectrum dataset are retrieved just fine
  4. The original code requesting the spectrum data in dataframe works fine when pointing to the Eagle datasets Given the above, I think the issue is in bandwidth limitations on the NREL HSDS API. It could be that your bandwidth limit was > recently reduced, which would appear like something changed in the code that is just now causing failures but in reality is an > external change.

My recommendations right now are to either 1) stand up your own HSDS service so you can control your data streaming or > > 2) reach out to the NREL developer network to get your bandwidth increased.

In summary, it seems that the bandwidth of the NREL HSDS was reduced recently. He suggests we set up our own HSDS which I believe is quite firmly outside the scope of this project or reach out to the NREL developer network. I think we should go with the second option. @rpauly18 I am happy to move forward with this option but wanted to check with you on if you had contacts over there.

ssolson commented 1 year ago

@akeeste I expect these tests to pass and will remove from draft status when they do. I have updated the header post to this PR which captures the limitations/ compromise of the testing suite we are pushing to the develop branch.

  1. No hindcast tests are being run (these call the NREL HSDS API)
  2. mhkit pip and python 3.7 are not being tested (there is an issue with dolyfn). A conda install of py3.7 is working and being tested.
  3. netCDF version is currently pinned to 1.5.8 (not the most recent)