ArtesiaWater / hydropandas

Module for loading observation data into custom DataFrames
https://hydropandas.readthedocs.io
MIT License
52 stars 10 forks source link

Generic import/export function for Obs and ObsCollection classes #124

Closed tdmeij closed 1 year ago

tdmeij commented 1 year ago

Would it be possible to add a generic import/export functionality to the Obs and ObsCollection classes? This would make temporarily saving of intemediate results easier for users who would like to avoid the technical challenges of setting up a pystore workflow.

Currently, hydropandas supports import of a lot of different external data sources. However, given the creative skills of software developers that seem to invent a new data format every week, unsupported data formats will probably keep on popping up forever. For instance, Hydropandas current version 0.7.3 doesn't seem to support the WaterWeb, Dawaco or HydroMonitor csv export formats. Fortunately, it is relatively easy for Hydropandas users to write their own import classes and create Obs and ObsCollection instances from raw data files and process their data. However, given the size and format of data files, reading files can take quite some time. In the current Hydropandas version 0.7.3, I can save data to a json format (using the inherited Pandas method), but I can not read this data directly back into a Obs or ObsCollection instance because an import method is missing from the Obs and ObsCollection classes.

Therefore, it would be convenient to be able to import raw data into an Obs or Obscollection class, resample data to a more manageable frequency, and save these intermediate result to a temporarily file that can be read directly using Obs or ObsCollection methods.

dbrakenhoff commented 1 year ago

You can pickle the ObsCollection/Observations using the to_pickle() method. Loading these using pandas (pd.read_pickle()) will give you back the original ObsCollection or Observation. Maybe that solves your issue somewhat?

I'm all for a generic human-readable export format for Observations. I'd suggest some kind of CSV format that includes some information about its Obs type(?). Then I guess we need to the define some kind of header format and then write the time series data below that. If we want to attempt to maintain data types on import, that will be a bit of a challenge. An ObsCollection could then just use that Observation export format to write CSV files for each Observation in the collection.

If anyone else has any suggestions regarding this topic, feel free to post them here.

tdmeij commented 1 year ago

Thank you David, this answers my question. After reading back the pickled object, I even get an ObsCollection object instead of the DataFrame I had expected. Magic still happens, apparently.

OnnoEbbens commented 1 year ago

Hahaha, I had the same first reaction when the pd.read_pickle() returned an ObsCollection object. It is magic!

There is also a to_excel() method for an ObsCollection. This will create an excel file with one tab with all the metadata and another tab for each observation object with the measurement time series. This is imo the best way to export to a human-readable format. Unfortunately we don't have a read_excel() method yet for an ObsCollection but I think it is not too hard to create one.

martinvonk commented 1 year ago

Maybe we can create a simple hpd.ObsCollection.from_pickle() method that calls pandas.read_pickle()? To increase findability.

OnnoEbbens commented 1 year ago

I've added a read_excel and read_pickle function to hydropandas. I updated the example notebook 01_groundwater_observations with calls to the excel and pickle functions for an ObsCollections.