ODM2 / ODM2PythonAPI

A set of Python functions that provides data read/write access to an ODM2 database by leveraging SQLAlchemy.
http://odm2.github.io/ODM2PythonAPI/
BSD 3-Clause "New" or "Revised" License
4 stars 13 forks source link

Create a utility module (util) with useful helper functions #164

Open emiliom opened 5 years ago

emiliom commented 5 years ago
aufdenkampe commented 5 years ago

@emiliom, I really like this idea, as I think it is fundamental to the purpose of having a functional and performant Python API. I think the extra requirements are a small price to pay for also getting efficient and tested I/O capabilities.

Thanks for continuing to move this critical repo forward.

horsburgh commented 5 years ago

@emiliom - I don't have a really strong feeling about this other than I think we should be very careful about adding additional requirements and complexity. My feeling is that we never finished the core functionality and so adding additional functionality and dependencies should perhaps be secondary to firming up the foundation.

Utility functions would be nice. Is there ongoing work that's driving this?

aufdenkampe commented 5 years ago

@horsburgh, good question. I'm also interested in hearing what is motivating this work!

I agree with the points about managing complexity and need to better develop core functionality. I also believe that -- given that Pandas has become a core part of the standard Python computational science and data science stack -- that we should consider strong integration with Pandas and GeoPandas as core functionality. This is especially true given that one of the highest priorities we've heard from users and potential users is to improve I/O performance (including data alignment and slicing), and that is one of the main purposes/advantages of using Pandas.

emiliom commented 5 years ago

For my own future reference, to be moved into new issues when I'm ready to work on this stuff.

Enhanced timeseries result values

From the WaterQualityMeasurements_RetrieveVisualize.ipynb example in the odm2api documentation.

# set the index to ValueDateTime for convenience.
tsValues = read.getResultValues(resultids=[1], lowercols=False)
tsValues.set_index('ValueDateTime', inplace=True)
tsValues.sort_index(inplace=True)

And to conveniently unpack relevant metadata, on variable names and units, use something like tsResult.VariableObj.VariableNameCV and tsResult.UnitsObj.UnitsAbbreviation.

GeoPandas GeoDataFrame

Starting point for ingesting Sites into a GeoDataFrame. From the WaterQualityMeasurements_RetrieveVisualize.ipynb example in the odm2api documentation.

import geopandas as gpd

# Get all of the SamplingFeatures from the ODM2 database that are Sites
siteFeatures = read.getSamplingFeatures(sftype='Site')

# Read Sites records into a Pandas DataFrame
# "if sf.Latitude" is used only to instantiate/read Site attributes)
df = pd.DataFrame.from_records([vars(sf) for sf in siteFeatures if sf.Latitude])

# Create a GeoPandas GeoDataFrame from Sites DataFrame
ptgeom = [Point(xy) for xy in zip(df['Longitude'], df['Latitude'])]
gdf = gpd.GeoDataFrame(df, geometry=ptgeom, crs={'init': 'epsg:4326'})

High-level database core summary

aufdenkampe commented 5 years ago

@emiliom, thanks for all your work on this!