Open xldeltares opened 1 year ago
@xldeltares thanks for posting the issue! I edited to add more context to it.
This is a good case to extend the NoDataStrategy
to this usecase. No data is available for the full slice, so the user should be able to 'ignore', which should raise a warning (which you can suppress), or to raise an error. Perhaps there is also a way to 'force' a certain slicing? That you can a lot of NaNs or something for the time slices that are not present?
Any thoughts on this @xldeltares @hboisgon ?
Hi @Jaapel I guess that could work but I don't know enough about it so here are some questions I have:
NoDataStrategy
global for the whole model building or can you set it for each data that you are trying to read?* Is the `NoDataStrategy` global for the whole model building or can you set it for each data that you are trying to read?
I would set it per dataset!
- Not sure if in the case where you want data for 2010 but no slicing is done and you get data instead for 2000-2020 (case of NoLeap calendar) then would this still work?
This one I do not understand. No slicing is done, is that what the users wants, no slicing?
* Another concern: in that case, I am worried about missing times but no necessarily missing values on lat/lon (example some meteo data only have values for land and not coast which for hydrological model is alright). Can you specify the axis/dims where you want to maybe raise an error (for time only and not lat/lon)?
I think we indeed have to be specific about when there is "no data". I would implement that per kind of data adapter. We can also be more specific and try more strategies, like raise_if_any
, where you raise an error if any NaNs appear, or raise_if_all
, where you raise an error if all values are NaN.
* Is the `NoDataStrategy` global for the whole model building or can you set it for each data that you are trying to read?
I would set it per dataset!
Great!
* Not sure if in the case where you want data for 2010 but no slicing is done and you get data instead for 2000-2020 (case of NoLeap calendar) then would this still work?
This one I do not understand. No slicing is done, is that what the users wants, no slicing?
No the user do want the slicing but for some reason hydromt does not do it (eg datetime format of the data is not parsed correctly and hydromt skips time slicing without any warning or error). So in that case, you get more data than what you expect/ask for.
* Another concern: in that case, I am worried about missing times but no necessarily missing values on lat/lon (example some meteo data only have values for land and not coast which for hydrological model is alright). Can you specify the axis/dims where you want to maybe raise an error (for time only and not lat/lon)?
I think we indeed have to be specific about when there is "no data". I would implement that per kind of data adapter. We can also be more specific and try more strategies, like
raise_if_any
, where you raise an error if any NaNs appear, orraise_if_all
, where you raise an error if all values are NaN.
Sounds useful!
The more I read about it, the more I wonder if this is the right way to go for this issue though. The main goal here is to do a check if the time slicing is correct. By default, if you have less timestep than asked, default behavior is that the starttime and endtime of the data are changed so you wouldn't get extra timesteps with NoData values. Unless you maybe change the slicing method (e.g. changing the method of xarray.sel to fill missing timesteps with nodata).
And in the case of hydromt not doing the slicing and getting more times than necessary, then it's really not a problem of NoData. So I guess in any case we may need to implement an extra check to check if starttime and endtime of the data that you read is indeed equal to the requested start and end time_tuple that the user is giving.
But maybe like the NoDataStratetegy
something like SlicingStrategy
??
I also encountered this issue where I have to work with data that has cftime objects in the time dimension as the calendar format is not fully compliant with the proleptic_gregorian
calendar. This indeed results in not being able to slice data.
Kind of request
Adding new functionality
Enhancement Description
If a time_tuple is passed to any of the get_data, there should be an extra check in core to make sure the slicing happened. Otherwise a warning or error should be reported.
Use case
When reading data and providing a time_tuple it can get quite confusing and lead to errors if the output times were not sliced properly. So far for example, the current behavior in get_rasterdataset is for np.datetime64 type, if all requested times are not present, you get a shorter timeseries than requested. If time type is cftime.datetime no slicing happens at all so you can both shorter and longer timeseries than requested.
This has lead to quite some extra lines in for eg hydromt-wflow to get around this and still get running wflow models out of hydromt.
I think it would be nice to send either an error or if the plugins wish to do extra transformation still then a warning instead (maybe via some kind of allow_time_error True/False flag)
Additional Context
Also linked to issue #492