gdemin / maditr

Fast Data Aggregation, Modification, and Filtering
61 stars 3 forks source link

to_wide should not use dcast(fun.aggregate=identity) #18

Open tdhock opened 7 months ago

tdhock commented 7 months ago

Hi @gdemin Recently data.table merged some new dcast code which uses more strict checking of fun.aggregate, which is supposed to be a function which returns a single value. This is documented on ?dcast: " The aggregating function should take a vector as input and return a single value (or a list of length one) as output."

Using new data.table from github master, we ran example("to_wide") which gave the following:

> iris %>%
+     to_long(list(Sepal = cols("^Sepal"), Petal = cols("^Petal"))) %>%
+     let(
+         variable = factor(variable, levels = 1:2, labels = c("Length", "Width"))
+     ) %>%
+     to_wide(values_in = c(Sepal, Petal))
Error: Aggregating function(s) should take vector inputs and return a single value (length=1). However, function(s) returns length!=1. This value will have to be used to fill any missing combinations, and therefore must be length=1. Either override by setting the 'fill' argument explicitly or modify your function to handle this case appropriately.

Details: https://github.com/Rdatatable/data.table/issues/6032

It seems that to_wide is calling dcast with fun.aggregate=identity which is problematic because that returns a vector with length>1. Can you please modify your code so that it uses a fun.aggregate that returns a single value? (length=1)

Thanks in advance!

gdemin commented 7 months ago

Hi @tdhock,

Thank you, I will fix it in the near future. I think I will change the test and leave identity as default. This is because I mostly use to_wide for conversion from long form. In this case for each combination there should be only one value. And if there are more values then something is going wrong and an error message is absolutely appropriate.

Regards, Gregory

tdhock commented 7 months ago

great, thanks, that sounds reasonable.