MHKiT-Software / MHKiT-Python

MHKiT-Python provides the marine renewable energy (MRE) community tools for data processing, visualization, quality control, resource assessment, and device performance.
https://mhkit-software.github.io/MHKiT/
BSD 3-Clause "New" or "Revised" License
50 stars 45 forks source link

NDBC to MHKiT #56

Closed ssolson closed 3 years ago

ssolson commented 4 years ago

Add functionality to convert requested NDBC data to DateTime format and configure the data for use in MHKiT. The example in #53 shows a for loop which sets a DateTime index and removes the NDBC columns. This is a required step to use the data in MHKiT. Functionality that performed the conversion for the user would be useful as this will be a common step users will want to take.

An open discussion point is if historical data should be indexed by time. Currently, MHKiT standard format expects data to be increased by column. So for time domain and frequency domain spectral data, the index would be the frequency bins and the columns would be time and spectral density respectively. For this spectral data, this allowed MHKiT to specify a standard format. However, as MHKiT starts to incorporate more data (e.g. NDBC historical standard meteorological data) this would mean that the index is now wind speed, wind direction, etc. and the columns are increasing datetimes.

Make Sure: [ ] For parameter=swden make sure to convert spectral frequency bins from str to float after dropping NDBC date columns

ssolson commented 4 years ago

Below is my proposed solution for the ndbc.to_mhkit format. This is essentially a functionalized version of Step 3 in the example given at the top of #53.

The highlights:

  1. I believe the function should operate on a Dataframe and not the Dictionary of DataFrames
    • I prefer to operate on the DataFrame from a simplicity point of view. Further, the for-loop shown below could easily be a one-liner ([ndbc_data[year] = to_mhkit(parameter, ndbc_data[year]) for year in ndbc_data]) so the overhead on the user is still quite small in my opinion.
  2. The returned DataFrame for any historical parameter should be returned with a DateTime index.
    • In the original release, the defined mhkit format was to have spectral density functions indexed by frequency bins because this created a standard format for both the time and frequency domain representations... I believe... (The decision is not well documented).
    • However, by adding the parameter stdmet (e.g. 46022 year 2019 stdmet ) this questions the original decision because for the 'stdmet' parameter it would be most intuitive to users and easiest to manipulate the data if the data were returned with a DateTime index
    • If we agree that we should return 1 historical data parameter should be indexed by DateTime then I would argue that all historical data should be returned by the Datetime index. This would further increase consistency with the river and tidal modules.

Let me know what you guys think and we can move forward with getting this function implemented!

# Set the parameter to be spectral wave density for NDBC buoy 46022
parameter='swden'
buoy_number='46022'

# Find available parameter data for NDBC buoy_number
available_data= ndbc.available_data(parameter, 
                                                          buoy_number)

# Get dictionary of parameter data by year
filenames= available_data['filename']
ndbc_data = ndbc.request_data(parameter, filenames)

# Iterate over each Dictionary key (e.g. 'year')
for year in ndbc_data:
    # Get the DataFrame of the year
    year_data = ndbc_data[year]
    # Create a Datetime Index and remove NOAA date columns for each year
     ndbc_data[year] = to_mhkit(parameter, year_data)

def to_mhkit(parameter, df):
    '''
    Converts the parameter DataFrame to the format expected by mhkit by
    modifying the passed DataFrame by removing the NDBC date columns, 
    and setting a datetime index

    Parameters
    ------------
    parameter: string
        'swden' :   'Raw Spectral Wave Current Year Historical Data'
        'stdmet':   'Standard Meteorological Current Year Historical Data'
    df: DataFrame
        NDBC data in dataframe to be converted
    Returns
    -------
    ndbc_data: DataFrame
        Dataframe with NDBC date columns removed, and datetime  index
    '''
    year_data['date'], ndbc_date_cols = dates_to_datetime(parameter, 
                                                          year_data, 
                                                          return_date_cols=True)
    year_data = year_data.drop(ndbc_date_cols, axis=1)
    if parameter=`swden`:
        # Convert columns to float now that the ndbc_date_cols (type=str) are dropped
        year_data.columns = year_data.columns.astype(float)   
    year_data = year_data.set_index('date')

    return ndbc_data