Deltares / hydromt

HydroMT: Automated and reproducible model building and analysis
https://deltares.github.io/hydromt/
MIT License
65 stars 28 forks source link

ENH: improve adapter for GeoDataset #372

Closed hboisgon closed 7 months ago

hboisgon commented 1 year ago

Feature Type

Enhancement Description

Currently the GeoDataset adapter is quite limited if the files are not in direct netcdf format. It would be nice to support a little more options:

https://github.com/Deltares/hydromt/blob/0def737fbdf4fa2fb7891c790ad0fbcaebb84733/hydromt/io.py#L309-L313

Feature Description

Parts can be implemented in io.open_geodataset and other parts in geodataset.get_data.

Additional Context

No response

Tjalling-dejong commented 12 months ago

I discussed this issue with @DirkEilander. We came to the conclusion that we have 2 options to solve the first two points. The options are described below in a mock data catalog:

# option1 locations + linked_data_source
# requires new driver for dataframeadapter to support timeseries_from_table
# requires new DatasetAdapter to support nc data    
waterlevels_geo_loc:
  path: /path/to/stations.csv # locations
  linked_data_source:  timeseries_data / timeseries_data_nc
  data_type: GeoDataset
  driver: vector # hydromt.io.open_vector (consitent with GeoDataFrameAdapter)
  crs: 4326

timeseries_data:
  path: /path/to/stations_data.csv
  data_type: DataFrame
  driver: timeseries # TODO add io.open_timeseries_from_table to DataFrameAdapter

timeseries_data_nc:
  path: /path/to/timeseries.nc
  data_type: Dataset # TODO does not yet exist
  driver: netcdf 
# option2: open_geodataset driver (current approach, but rename driver)
# TODO: limit in scope for specific loc+timeseries_from_table data
# or should we remove this method?
# TODO rename open_geodataset to open_geodataset_from_mf
waterlevels_geo:
  path: /path/to/stations.csv # locations
  data_type: GeoDataset
  driver: geodataset_from_combined_files # hydromt.io.open_geodataset (instead of driver=vector)
  crs: 4326
  driver_kwargs:
    fn_data: /path/to/stations_data.csv

I am in favor of option 1 since the timeseries data would be specified in a separate data source and could have it's own driver_kwargs and the timeseries data could be accessed separately as well. This would require adding a timeseries driver to the DataFrame adapter and adding a linked_data_source argument to the location data source . Also a new dataset adapter needs to be created to accommodate netcdf timeseries files that do not have geodata.

Option 2 is a expansion of the current approach. We would need to change the driver to geodataset_from_combined_files for clarity.

Tjalling-dejong commented 12 months ago

@DirkEilander I am bit lost on how option 2 was a solution again? This option does not allow for using a datacatalog entry for instance.

Tjalling-dejong commented 12 months ago

I moved the third point on multiple fn_data files to a separate issue: #453

DirkEilander commented 12 months ago

@DirkEilander I am bit lost on how option 2 was a solution again? This option does not allow for using a datacatalog entry for instance.

Correct, it is not a solution to the issues, but a way to keep the io.open_geodataset method as a driver next to option 1. However, we need to discuss if we want to keep the open_geodataset method. We also discussed that adding additional drivers to read nc inside this method is not ideal as it may become very unclear if a user wants to pass arguments to the different drivers. We also agreed passing DataAdapters to an io function is not a good idea. IO methods (a.k.a. drivers) are called from the Adapters and not the other way around.

DirkEilander commented 11 months ago

From discussion with @hboisgon

Tjalling-dejong commented 9 months ago

I am currently creating the new DatasetAdapter. I think that there are quite some methods in the GeoDatasetAdapter that will be (almost) identical to the methods needed in DatasetAdapter and it would thus violate the DRY principle. Is this a case for inheritance? And if so in what way? I can imagine that this will also increase complexity. What are your thoughts on this @savente93 @DirkEilander ?

savente93 commented 9 months ago

Sorry I'm a bit confused, what are we going to use this new dataset adapter for?

Tjalling-dejong commented 9 months ago

I am referring to the second point of Dirks last comment

savente93 commented 9 months ago

Right, sorry. Personally I think that inheritance makes things more complex rather than less. I think that as with the other adapters, the functionality is just different enough to make inheritance not a great idea. I'm personally more in favour of these rules of thum:

  1. if it really is generic, put it in the base adapter class
  2. if you can create somewhat generic functions do that and then implement the adapters by calling the generic functions (wherever they make the most sense)
  3. if it isn't any of the above ones, it's probably different enough to warent it's own implementation.

Does that make sense? That's my take at least, not sure how the others feel.

Tjalling-dejong commented 9 months ago

Thanks for the advice! I will keep the adapters separate and check where I can refactor some of the methods to more generic functions if that is necessary.