Open peterdesmet opened 2 years ago
After our short chat, I completely agree on the benefit of having such a function in this package to cover basic and quite typical steps of handling data packages. Some thoughts:
imap
is used within edit_fields
. In this way users can be inspired and write their own custom functions for cases way too specific for being included in the package. Sooner or later something like that will happen.recode
in my daily life 😄 I think we should go for the loop option. And here below I show you a simple way to solve the drawback by using a named vector:
# get field names
field_names <- map_chr(iris_schema$fields, ~ .$name)
field_names
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
# define values as a named vector
values <- c("Sepal length in cm.", "Sepal width in cm.", "Petal length in cm.", "Petal width in cm.", NA_character_)
names(values) <- field_names
values
iris_schema <- edit_fields(
iris_schema,
"description",
values
)
So, if the user provides an unnamed vector, then the order of the fields is used: maybe a message can be returned providing the order the function will use. Otherwise, the values are set based on the field names defined in the names.
@peterdesmet: in this way I think the loop function will match all our expectations. What do you think?
Also suggested by @beatrizmilz in https://github.com/ropensci/software-review/issues/495#issuecomment-1025860861:
Adding the descriptions to the schema does not seem trivial. There is an example with the purrr package. But the example might be not simple to understand if someone is not used to the purrr package.
I`m talking about this piece of code:
iris_schema <- create_schema(iris)
# Remove description for first field
iris_schema$fields[[1]]$description <- NULL
# Set descriptions for all fields
descriptions <- c(
"Sepal length in cm.",
"Sepal width in cm.",
"Pedal length in cm.",
"Pedal width in cm.",
"Iris species."
)
iris_schema$fields <- purrr::imap(
iris_schema$fields,
~ c(.x, description = descriptions[.y])
)
Do the authors think that it is possible to create a function to add descriptions to the schema, in a way that is used in a similarly to the other functions of the package? Example of the idea:
iris_schema <- create_schema(iris) |>
add_description(
c(
"Sepal length in cm.",
"Sepal width in cm.",
"Pedal length in cm.",
"Pedal width in cm.",
"Iris species."
)
)
Finally got some time to think about this.
schema <-
PlantGrowth %>%
create_schema()
str(schema)
#> List of 1
#> $ fields:List of 2
#> ..$ :List of 2
#> .. ..$ name: chr "weight"
#> .. ..$ type: chr "number"
#> ..$ :List of 3
#> .. ..$ name : chr "group"
#> .. ..$ type : chr "string"
#> .. ..$ constraints:List of 1
#> .. .. ..$ enum: chr [1:3] "ctrl" "trt1" "trt2"
update_schema()
(cf. to what @damianooldoni suggested above). Properties are added based on field order. Here we only provide a vector of length 1, so only the first field gets a property.schema <-
schema %>%
update_schema(
property = "unit",
values = c("g")
)
str(schema)
#> List of 1
#> $ fields:List of 2
#> ..$ :List of 2
#> .. ..$ name: chr "weight"
#> .. ..$ type: chr "number"
#> .. ..$ unit: chr "g" <--------
#> ..$ :List of 3
#> .. ..$ name : chr "group"
#> .. ..$ type : chr "string"
#> .. ..$ constraints:List of 1
#> .. .. ..$ enum: chr [1:3] "ctrl" "trt1" "trt2"
update_schema()
. The convenience function field_names()
is used to name the vector (#196):descriptions <- c("Weight of the plant", "Group the plant is in")
names(description) <- field_names(schema)
names(descriptions) <- names
descriptions
#> weight group
#> "Weight of the plant" "Group the plant is in"
schema <-
schema %>%
update_schema(
property = "description",
values = descriptions
)
str(schema)
#> List of 1
#> $ fields:List of 2
#> ..$ :List of 2
#> .. ..$ name: chr "weight"
#> .. ..$ type: chr "number"
#> .. ..$ unit: chr "g"
#> .. ..$ description: chr "Weight of the plant" <--------
#> ..$ :List of 3
#> .. ..$ name : chr "group"
#> .. ..$ type : chr "string"
#> .. ..$ description: chr "Group the plant is in" <--------
#> .. ..$ constraints:List of 1
#> .. .. ..$ enum: chr [1:3] "ctrl" "trt1" "trt2"
schema <-
schema %>%
update_schema(
property = "name",
name = c("foo")
)
#' Error: "name" is a reserved field property.
package <-
create_package() %>%
add_resource("plant-growth", PlantGrowth, schema = schema)
package$resources[[1]]$schema <- schema
I'm tempted to go for update_schema()
rather than edit_fields()
. update_fields()
would be a valuable alternative, it's just clear that it returns a schema (not fields).
get_schema(package, resource_name) => schema
create_schema(df) => schema
update_schema(schema) => schema <-----
field_names(schema) => vector
@damianooldoni @PietrH @nepito what do you think?
A created schema will only have the field properties
name
,type
and (sometimes)constraints
. I see it as fairly common to add more properties, such asdescription
,required
etc. It is possible to do that withpurrr
, but it isn't very straightforward. Maybe a specific function would be useful.Create schema:
Atomic function
Not sure this is super useful, but it is very clear what field you are setting.
Loop function
Faster, but disconnect between field name and value you want to set.
Recode like function
Note, it should also work for nested properties: