cynthiahqy / conformr-xmap-project

R Package for harmonising data of different classifications or aggregations into a single dataset
MIT License
7 stars 1 forks source link

Implement "Immutable" xmap list class to avoid re-validation at transformation step? #98

Closed cynthiahqy closed 1 year ago

cynthiahqy commented 1 year ago

So currently the validation for xmap_df format is somewhat fragile, because a user could create an xmap_df and then modify it using standard data.frame or dplyr operations:

> xmap |> dplyr::mutate(x = 5)
  x y          z
1 5 1 0.93499353
2 5 2 0.30061182
3 5 3 0.29352285
4 5 4 0.17409116
5 5 5 0.06566421
> xmap |> dplyr::mutate(x = 5) |> class()
[1] "xmap_df"    "xmap"       "data.frame"

it would be quite hard to detect these modifications in downstream functions like apply_xmap() #95, meaning you'd probably have to redo the weight validation again later on.

If however, you had a list class like xmap_list it would be more difficult to mess up weights etc., and you could in theory just do a class check. Still, there's probably no getting around re-validating the df at the transformation step.

> class(xmap) <- c("xmap_list", "xmap")
> xmap
$x
[1] "a" "b" "c" "d" "e"

$y
[1] 1 2 3 4 5

$z
[1] 0.93499353 0.30061182 0.29352285 0.17409116 0.06566421

attr(,"class")
[1] "xmap_df" "xmap"   
attr(,"row.names")
[1] 1 2 3 4 5
attr(,"col_from")
[1] "x"
attr(,"col_to")
[1] "y"
attr(,"col_weights")
[1] "z"
attr(,"from_set")
[1] "a" "b" "c" "d" "e"
> xmap |> dplyr::mutate(x = 5)
Error in UseMethod("mutate") : 
  no applicable method for 'mutate' applied to an object of class "c('xmap_list', 'xmap')"
cynthiahqy commented 1 year ago

This seems somewhat unavoidable, so lighter weight checks are probably the way to go (e.g. vhas_xmap_props)