USGS-CMG / usgs-cmg-portal

do_convert.sh
6 stars 13 forks source link

Better way to process Santa Cruz & St Pete data? #287

Open rsignell-usgs opened 6 years ago

rsignell-usgs commented 6 years ago

@kwilcox we are going to try again to bring Santa Cruz and St Pete data into the portal.

On this page: https://github.com/USGS-CMG/usgs-cmg-portal/tree/master/santa_cruz_obs_data you say

This is a proof of concept repository for discovering a better way to convert USGS CMG observational data into CF compliant netCDF files.

Should @emontgomery-usgs try to work with this "better way", or should she work with the existing "collect.py" approach?

kwilcox commented 6 years ago

That folder is the "better way". Instead of trying to convert all types of instruments in a single process I split it by type (ADCP Waves, ADCP Currents, CTD, and ADV).

emontgomery-usgs commented 6 years ago

@kwilcox , I interpret this to mean I should work from the santa_cruz_obs_data directory. Is that what you meant?

kwilcox commented 5 years ago

A better way forward would be to construct a pandas.DataFrame for each dataset you would like exported as a netCDF file and use something like pocean to construct the valid CF files. This might take a little work to cover all of the corner cases but would overall remove the need for any "custom" conversion code to netCDF, the only converstion would be into a pd.DataFrame

rsignell-usgs commented 5 years ago

Isn't xarray a better fit for netCDF than pandas?

kwilcox commented 5 years ago

xarray is great for reading in and interpreting the data but it doesn't understand the CF conventions when it comes to DSG types. That is why pocean was created, and it currently only works at the pd.DataFrame level. I could make some changes to allow an xarray object to be written to DSG compliant netCDF files, but that isn't something available today.

dnowacki-usgs commented 5 years ago

Can we get an update on the suggested workflow using pocean? The API documentation is a bit sparse and some examples would be helpful. We have eight datasets to add to the portal from other centers (7 from Santa Cruz, 1 from St Pete).

kwilcox commented 5 years ago

The best examples of pocean are notebooks out in the wild. It could def. use some additional examples in the documentation.

The next best thing are the tests within the pocean codebase:

https://github.com/pyoceans/pocean-core/tree/master/pocean/tests/dsg

emontgomery-usgs commented 5 years ago

@dnowacki-usgs Is the st pete one Legna's Crocker Reef data?

dnowacki-usgs commented 5 years ago

@emontgomery-usgs See #303