ArtesiaWater / hydropandas

Module for loading observation data into custom DataFrames
https://hydropandas.readthedocs.io
MIT License
51 stars 10 forks source link

Interpolate ObsCollection #109

Closed martinvonk closed 1 year ago

martinvonk commented 1 year ago

This PR adds the interpolation method to ObsCollections.

Few things to do/discuss still:

OnnoEbbens commented 1 year ago

Well done! Some thoughts I have:

I removed the interpolate method from the EvaporationObs but I don't know how I feel about this. I don't mind interpolating separately from an ObsCollection but it was/is a nice feature. I think this is the way to go. You can still use the interpolation but now you will have more control on which observations will be used in the interpolation. Maybe we can add an option to read_knmi to get all stations within an extent. This will make it easier to get an ObsCollection which can be used for interpolation.

The interpolation is now a method of the ObsCollection class because that felt most natural to do. I agree

I am not sure how much metadata from the original ObsCollection to parse to the interpolated ObsCollection. The interpolated time series is now always of type Obs but could maybe be the same obs type as all other obs in the ObsCollection. This is a difficult question, especially because the output of the interpolation is not really an observation. I see 3 options:

  1. use the Obs type because the x, y, source and maybe a meta dictionary are the only relevant metadata for the interpolated time series.
  2. use the same obs type as all other obs in the ObsCollection. If you have interpolated multiple EvaporationObs objects it would make sense if the output is also an EvaporationObs object so you know the time series contains evaporation data. You could even check if all observations in the ObsCollection have the same attribute values and add these attributes to the new EvaporationObs (although that may be dangerous sometimes).
  3. create a new observation type, something like InterpolatedObs. This will make it more clear the data is an interpolation of other observations rather than a time series with measured values

I think I would go for the 2nd option but I am also curious what you think is the best idea? If you go for the 2nd option you can use the existing _infer_otype method of an ObsCollection to get the obs type.

martinvonk commented 1 year ago

I removed the interpolate method from the EvaporationObs but I don't know how I feel about this. I don't mind interpolating separately from an ObsCollection but it was/is a nice feature. I think this is the way to go. You can still use the interpolation but now you will have more control on which observations will be used in the interpolation. Maybe we can add an option to read_knmi to get all stations within an extent. This will make it easier to get an ObsCollection which can be used for interpolation.

Yes agree for an extent selector. However, you kind of want all KNMI stations for Evaporation interpolation so the function io.knmi.get_stations("EV24") is fine for now.

The interpolation is now a method of the ObsCollection class because that felt most natural to do. I agree

I am not sure how much metadata from the original ObsCollection to parse to the interpolated ObsCollection. The interpolated time series is now always of type Obs but could maybe be the same obs type as all other obs in the ObsCollection. This is a difficult question, especially because the output of the interpolation is not really an observation. I see 3 options:

  1. use the Obs type because the x, y, source and maybe a meta dictionary are the only relevant metadata for the interpolated time series.
  2. use the same obs type as all other obs in the ObsCollection. If you have interpolated multiple EvaporationObs objects it would make sense if the output is also an EvaporationObs object so you know the time series contains evaporation data. You could even check if all observations in the ObsCollection have the same attribute values and add these attributes to the new EvaporationObs (although that may be dangerous sometimes).
  3. create a new observation type, something like InterpolatedObs. This will make it more clear the data is an interpolation of other observations rather than a time series with measured values

I think I would go for the 2nd option but I am also curious what you think is the best idea? If you go for the 2nd option you can use the existing _infer_otype method of an ObsCollection to get the obs type.

Currently I use option 1 as you describe. Technically, you can have different types of Obs and interpolate them. For instance, if you have an GroundwaterObs and EvaporationObs in your ObsCollection; you can just interpolate them. We could go for option 2 and disallow interpolation of different kinds of Obs and choose the original Obs type. However, I also kind of like the idea of an InterpolatedObs where there is an attribute with the (different) Obs types it was interpolated from. And maybe a method that transforms this InterpolatedObs to that Obs type?

OnnoEbbens commented 1 year ago

I think we should raise an error when you want to interpolate between different Obs types. Maybe an error that you can suppress.

I don't know if InterpolatedObs is such a good idea because all the other Obs types specify the type of observation (Evaporation, Groundwater, Waterlvl, etc.). The InterpolatedObs would only specify how the 'observation' was created.