MHKiT-Software / MHKiT-Python

MHKiT-Python provides the marine renewable energy (MRE) community tools for data processing, visualization, quality control, resource assessment, and device performance.
https://mhkit-software.github.io/MHKiT/
BSD 3-Clause "New" or "Revised" License
47 stars 45 forks source link

Convert loads module to xarray #279

Closed akeeste closed 6 months ago

akeeste commented 7 months ago

Loads/general

Loads/extreme

akeeste commented 7 months ago

I'm currently working through a few issues with getting the tests to pass. There is a minor bug in the standard deviation calculation in loads.general.bin_statistics.

The original code uses pandas input. In bin_statistics the Pandas input, data, is sliced by both signal_name and binnumber:

https://github.com/MHKiT-Software/MHKiT-Python/blob/4eeb5989d6fcfd7ce9a434cde79092bc83911bbf/mhkit/loads/general.py#L59-L70

However, one issue with this set-up--If a signal within a Pandas DataFrame is sliced and only has one value (i.e. for i=22 with the test data), the result is a Pandas Series, not another Pandas DataFrame. The std of a pandas Series returns a float, whereas the std of a pandas DataFrame returns a DataFrame of length 1. This results in data that should have std=0 to error in line 68 and results in std=nan on Line 70 instead. The original data being tested against was created with this error and led to the test failing. I updated the test data to reflect the correct standard deviation (0) for those bins with only one data point.

df = pd.DataFrame([1., 2.])
s = pd.Series([1., 2.])
print(df.std())
print(s.std())
print(df.std()[0])
print(s.std()[0])

Instead of calculating the standard deviation manually, I suggest we use scipy.stats.binned_statistic with statistic='std', just as we do in line 59 above.

akeeste commented 7 months ago

@ssolson This PR is ready for review.

ssolson commented 7 months ago

@ssolson This PR is ready for review.

Thanks Adam, I will not be able to complete my review of this till Wednesday this week due to travel. I will tackle as soon as I can.

akeeste commented 6 months ago

Thanks @ssolson I'll go through the review and make updates early next week.

akeeste commented 6 months ago

TODO from discussion with @ssolson today:

also

akeeste commented 6 months ago

@ssolson I addressed your review comments and added the option for a user to specify the time dimension for >1D variables. But I don't know a robust way to verify that the user's specified dimension is for example a timeseries, so I think we should leave that for another update.

The user is also limited to inputting a single variable for some functions, like mler_coefficients where the user should really only input 1 wave spectrum at a time.

akeeste commented 6 months ago

Thanks @ssolson. I'll make those changes then merge this PR

ssolson commented 6 months ago

@akeeste the CDIP Threads Server seems to be down right now. The test will not pass until it is back up. https://thredds.cdip.ucsd.edu/

ssolson commented 6 months ago

Threads is back up.