Open eva0marques opened 3 months ago
My suggestions to improve this situation:
time_column
parameter in calc_
functions for spatio-temporal covariates (narr, geos, hms, gridmet, terraclimate). It would be a character designating the time column in locs
.time_column
exists in locs
(I would also rename locs
by sample
or points
or something more general rather than explicitly spatial) and that the data format is correct (POSIXCT with date and time for eg) create a function to extract at time stamp with the corresponding way
find_time(time_pts, time_cov, method)
calc_*
functions where output from calc_1
is used as locs
in calc_2
calc_1() |>
calc_2() |>
calc_3()
In calc_ pipes it would be easier to distinguish spatiotemporal points from spatial points 🤔 (eventually include the inflate function from spatial pipe to spatiotemporal one):
If the goal is to create a datatable to feed AI models:
my_spatial_sample |>
calc_nlcd() |>
calc_gmted() |>
... |>
inflate_to_spatiotemporal(timestamps) |>
calc_era5() |>
calc_modis()
...
If the goal is to store efficiently the calculated points:
my_spatial_sample |>
calc_nlcd() |>
calc_gmted() |>
... |>
writeRDS()
my_spatiotemporal_sample |>
calc_era5() |>
calc_modis() |>
... |>
writeRDS()
I think an option is updating the static calc functions to have an inflate
parameter. If inflate = TRUE
it automatically returns a spatio-temporal data frame (feed AI models example) where if inflate = FALSE
it is a list with a vector of dates and single spatial data frame (efficiency example).
Either way refactoring the calc_
functions to retain columns from the locs
to use in a pipe should not be too difficult to add.
Something like this
if (inflate) {
message("Returning a list with ... because inflate = TRUE")
inflated <- merge(dates, data.frame, all = TRUE)
return(inflated)
} else {
message("Returning a data.frame with ... because inflate = FALSE")
return(list(dates, data.frame))
}
Yes it is also an interesting solution. I would still make the inflate() function available to Amadeus users because they might be interested to use it separately. For eg, you store the non-inflated sample, reopen it, and use inflate function without recalculating everything.
@eva0marques
Sorry I am late for the discussion. As @mitchellmanware suggested, I think that a hands-on solution by adding several lines into calc_return_locs
with inflate
argument added. One thing to consider is how "full" space-time combinations are inferred or furnished, which can be implemented by using a fixed set of field names (i.e., lon
, lat
, and time
) or by adding additional argument for a full space-time combination templates (by using expand.grid
, for example). I think the former is more of a hands-on solution since we easily utilize set operations to detect the common field names for determining what to join and to expand. I have added some functions to do this in beethoven
already, so I'd be happy to make changes in functions at which we will agree to update to implement this functionality.
As a side note, if we are aiming to make calc_*
functions to be piped, the default value of inflate
or the equivalent argument should be TRUE
.
I've implemented my idea (my comment above) on my own project because it was the most optimized and flexible set up. It works pretty well, I'll be able to share my feedback if you are interested.
I am writing process and calc functions for other covariates that I need in my own project. I would like to open a discussion on the spatio x temporal case.
Let's say I want to create a model of AI to predict temperature at several locs x timestamps. I need to extract spatial covariates (easy) but also spatio x temporal ones.
In my ideal world, to do so:
locs
paramcalc_
functions to add columns for each covariate (they can be spatial or spatiotemporal). Thecalc_
functions for spatio-temporal covariates handle the "time" dimension properly, depending on the user's criteria (for eg: if geophysical model outputs are available every 3 days, and my predictions are every day:calc_
downscales the temporal resolution. It can also do the opposite if I have hourly data).It would look like this:
For now,
calc_
functions are not optimally designed for temporal dimension. It is implied thatlocs
is a spatial dataframe without time column. When calculating spatio-temporal covariates, it extracts all the time series offrom
. But iflocs
already has a time column (for eg created after calculating another spatio-temporal covariate), it becomes a mess.As a summary, I see the following limitations with our current version of
calc_
:calc_
functions in a row (I mean give the output of a calc function to the input of another calc function) after dealing with spatio-temporal covariatesIt is not urgent of course, but I think it would be interesting to address this discussion in the future for a better use of amadeus.