Jack-H-Buckner / UniversalDiffEq.jl

Universal differential equations for ecologists
MIT License
3 stars 0 forks source link

data frame names #17

Closed Jack-H-Buckner closed 3 weeks ago

Jack-H-Buckner commented 2 months ago

Many functions use data frames to store and manipulate data sets provided by the user to build models. Ideally, the user can input data frames with a range of plausible column names. For example, the time column could be "t", "T", "time", or "Time". However, underthe hood I want to be able to write functions that index the time column using a single known identifier, t. I would like to have code that takes the user-provided data frame, identifies the relevant column names, and creates a new data frame that uses a standard format. Specifically, time is in a column labeled "t," and if multiple time series are provided, they are in a column labeled "series," and each time series is labeled by an integer starting at one.

jarroyoe commented 2 months ago

We can just do like rEDM and heatwaveR where the user has to specify what the time column is

Jack-H-Buckner commented 2 months ago

That is a good idea - do they ask for the column name or the index?

jarroyoe commented 2 months ago

@zechmeunier mentioned this happens in heatwaveR so I don't know how it is done there. In rEDM it asks you for a string to specify which columns you want to use in the model. Asking the user for a string to specify the time column should work in this case.

zechmeunier commented 2 months ago

There are several functions in heatwaveR that require a time column. If there's a column t, then it's assumed to be time. But otherwise the user needs to set x = whatever time is called.

So for example: detect_event(reef_clim[[reef]], x = SampleDate, y = Temperature) is a function that will detect marine heatwaves based on a reef climatology list with columns SampleDate and Temperature.

The particular R code for this check: ts_x <- eval(substitute(x), data)\ if (is.null(ts_x) | is.function(ts_x)) stop("Please ensure that a column named 't' is present in your data.frame or that you have assigned a column to the 'x' argument.")

jarroyoe commented 3 weeks ago

In the Julia package this is being addressed by the function find_time_alias. In the R package the time can be preprocessed to either look for one of these columns, or let the user specify the time column and preprocess the data frame to change the name of that column to one of the aliases.

Closing this here as this is a potential issue in the R package but not the Julia package.