Open palderman opened 2 years ago
Hi @palderman, this feature has been on my to do list for a while now (#14) but I haven't had time to look at it further. I would be more than happy to review a PR to add improved support for validating data entries.
As discussed in #14 I think the col_options
argument would be a good place to add support for this. For example we could do something similar to the code below to restrict entries in vs
column of mtcars
to range [0,1]:
data_edit(
mtcars,
col_options = list(vs = c(0,1))
)
At the moment, the above code would trigger the vs
column to use a dropdown
menu with options 0
and 1
. This is because any vector passed to col_options
that doesn't contain either logical values or "date"
or "password"
character strings is treated as the options for data entry - so we resort to using dropdown columns to prevent typing errors. I do think there will be cases where users will want the above code to work as is, i.e. have numeric options in dropdown menus - so perhaps we should try to keep this functionality.
An alternative would be to add more flexibility to col_options
by allowing a list of arguments per column. For example, we could do something like this (vs
is the name of the column):
# numeric dropdown
data_edit(
mtcars,
col_options = list(
vs = list(
type = "dropdown",
options = c(0,1)
)
)
# numeric range
data_edit(
mtcars,
col_options = list(
vs = list(
range = c(0, 1),
precision = 0
)
)
In the above example each list could have options type
, options
, range
and precision
(and any other useful options).
type
controls the format for data entry (i.e. either password
, date
, checkbox
or dropdown
)options
indicates the available options for dropdown menusrange
allows specification of numeric ranges e.g. c(0,1)
. We may need to add another option like bounds
to indicate whether limits include upper and/or lower boundaries defined in range
.precision
indicates the number of decimal places for rounding (integers could have precision zero for example)Once we have a format for the new col_options
we need to make changes to the code in dataEdit.R
to make sure these options are appropriately handled. I think most of the changes will need to occur in this chunk of code which is triggered whenever the data changes (note that the column and row headers are handled separately in the chunk below).
https://github.com/DillonHammill/DataEditR/blob/d2840dc8a58aa82740a3b3208d4db0fed40aa030/R/dataEdit.R#L260-L286
Basically all you would need to do to compare the columns defined in col_options
in the new data (x_new
) and make sure all the listed criteria are met - otherwise we replace that column with the previous version defined in x_old
. This method is not very efficient as you will be checking all entries not just the ones that have changed (remember this code is triggered with every single change - i.e. each time only a single value will be changed).
I think there is a way to check exactly which cell has been edited by accessing slots in input$x
but you will need to look at rhandsontable
to see how this works.
Hope this helps, happy to answer any questions you may have along the way.
Dillon
Sorry for missing #14. I should have checked the existing issues more thoroughly.
I was thinking of something along the lines you outlined above for an integer with hard min/max values:
# numeric dropdown
data_edit(
mtcars,
col_options = list(
vs = list(
type = "integer",
min = 0,
max = 10
)
)
Errant values in that case would be rejected, but I was also thinking of being able to warn about out-of-range values instead of rejecting them out right. In that case, supposing you have an integer that is usually between 3 and 5, but on rare occasions could possibly have values as low as 0 or high as 10, we could specify it as:
# numeric dropdown
data_edit(
mtcars,
col_options = list(
vs = list(
type = "integer",
range = c(0, 10),
range_warn = c(3, 5)
)
)
Values outside range
would be rejected and values outside range_warn
but inside range
could be highlighted in red or yellow. Thoughts?
Looks like a good starting point. I have played around with colouring cells and it is complicated.
For DataEditR to work, we need to set useTypes = FALSE
in rhandsontable()
which means we cannot make calls to hot_col()
or hot_cell()
to format the cells - this is because useTypes = FALSE
is required for the column/row addition/removal functionality.
Perhaps you will find a way to get this to work.
Dillon, I have a question.
I'm running DataEditR:: inside shiny as I need to validate multiple databases, however col_options() has a stop(!all(colnames %in% df)) if the columns are not contained in the dataframe. Do you have any suggestions, other than rewriting the function locally, on how to deal with this issue?
@BarqueRodrigues can you poibt me to exactly where that line exists in the code? I can take a look. Given that it is an error I probably put it there for a good reason.
Dillon in line 47 on DataEditR::dataEditServer
if (!is.null(col_readonly)) { if (!all(col_readonly %in% colnames(data_to_edit))) { stop("'col_readonly' must contain valid column names.") } values$col_names <- unique(c(col_names, col_readonly)) }
So you are using col_readonly
? In that case it must match the columns in the data. What is your use case? You have some code I can look at?
Hi @DillonHammill ,
I recently stumbled across your package in my search for an R solution for data entry/data capture. It seems like your package would largely fit the bill.
However, I don't see any capability for restricting the range of numerical input values as part of a "Data Validation" workflow when entering data (see e.g. https://datacarpentry.org/spreadsheet-ecology-lesson/04-quality-control/index.html for what I mean by "Data Validation"). The
col_options
argument seems mostly to address data validation for known lists of discrete values, but I see no way to restrict column type to integer or min/max values for numeric input.Did I miss this somewhere in the documentation?
Where would you suggest I begin looking in your code to add this capability? I'm happy to collaborate/contribute code for this capability as it would greatly facilitate my own workflow.