Closed tdhock closed 4 months ago
@MichaelChirico any idea why this may be happening?
It looks like here's an MRE:
DT <- data.table(a = c(1,2,1), b = 1:3, c = 0)
dcast(DT, a~b, value.var="c", fill=list(0))
## ON master
# Error in dcast.data.table(DT, a ~ b, value.var = "c", fill = list(0)) :
# 'x' is not atomic
## ON 1.15.2
dcast(DT, a~b, value.var="c", fill=list(0))
# Key: <a>
# a 1 2 3
# <num> <num> <num> <num>
# 1: 1 0 0 0
# 2: 2 0 0 0
FYI @hadley it looks like this is coming from a {dtplyr} translation of pivot_wider
providing fill=list()
to the dcast()
call, which is a bit unusual, I wonder if the translation is WAI? Sorry I haven't dived into this aspect any further.
@jangorecki should we (1) extend coerceAs()
to behave more like coerceVector()
for VECSXP
input (2) special-case the handling of !isVectorAtomic(thisfill)
or (3) make the breaking change not to allow fill = list()
to get coerced as before?
I would have expected some kind of error message like "value.var is numeric but fill is list, so please change fill type to match value.var type"
related: with data.table release it is OK for fill=list when value.var is VECSXP, see below
> (dt=data.table(i=1:2, j=c("a","b"), L=list(4, 5:6), num=7:8))
i j L num
<int> <char> <list> <int>
1: 1 a 4 7
2: 2 b 5,6 8
> dcast(dt, i ~ j, value.var="L")
Key: <i>
i a b
<int> <list> <list>
1: 1 4
2: 2 5,6
> dcast(dt, i ~ j, value.var="L", fill=NA)
Key: <i>
i a b
<int> <list> <list>
1: 1 4 NA
2: 2 NA 5,6
> dcast(dt, i ~ j, value.var="L", fill=list(c("miss","ing"),"value"))
Key: <i>
i a b
<int> <list> <list>
1: 1 4 miss,ing
2: 2 miss,ing 5,6
and it is ok for fill to be list with elements of length=1, see below
> dcast(dt, i ~ j, value.var="num")
Key: <i>
i a b
<int> <int> <int>
1: 1 7 NA
2: 2 NA 8
> dcast(dt, i ~ j, value.var="num", fill=0)
Key: <i>
i a b
<int> <int> <int>
1: 1 7 0
2: 2 0 8
> dcast(dt, i ~ j, value.var="num", fill=list(0,1))
Key: <i>
i a b
<int> <int> <int>
1: 1 7 0
2: 2 0 8
but not OK for list element to be a vector of length>1, see below.
> dcast(dt, i ~ j, value.var="num", fill=list(0:1))
Error in dcast.data.table(dt, i ~ j, value.var = "num", fill = list(0:1)) :
'list' object cannot be coerced to type 'integer'
also in the above examples it seems that if fill=list with more than one element, only the first element is used, and the others are ignored, so I would have expected a warning in this case, like "fill is a list with 2 elements, but only the first element is used, so please remove the other elements of fill"
IINM all of the existing behavior is just inherited from base R coercion, c.f.
as.integer(list(0)) # 0L
as.integer(list(0,1)) # 0:1
as.integer(list(0:1)) # 'cannot be coerced' error
as.integer(list(0, 1:2)) # 'cannot be coerced' error
@jangorecki should we (1) extend
coerceAs()
to behave more likecoerceVector()
forVECSXP
input (2) special-case the handling of!isVectorAtomic(thisfill)
or (3) make the breaking change not to allowfill = list()
to get coerced as before?
It wasn't in its original scope but could be added.
If it is regression (as per first post) then it qualifies for patch release.
If it is regression (as per first post) then it qualifies for patch release.
This is still in dev, not any CRAN release
Sorry, I don't remember enough about this function or dtplyr to offer any useful feedback 😞
@TysonStanley from {tidyfast} I think you may be most familiar with {tidyr} <-> {data.table} translation, if you wouldn't mind checking how pivot_wider()
is being translated to dcast()
to see if there's any improvement we can recommend.
Yep, can take a closer look.
A relatively quick review suggests the list option isn't working in dtplyr::pivot_wider()
as intended. The way pivot_wider() normally works is you can give the values_fill arg a named list that applies the values differentially to each named variable. However, in the current dtplyr
it just uses the first value encountered.
library(data.table)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:data.table':
#>
#> between, first, last
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
library(dtplyr)
library(tibble)
df <- lazy_dt(tibble(g = c(1, 2), var = c("x", "y"), val = c(1, 2), val2 = c(1, 2)))
dt = as.data.table(df)
# shows where NAs are and what the call looks like
df %>%
pivot_wider(names_from = var, values_from = val)
#> Source: local data table [2 x 4]
#> Call: dcast(`_DT1`, formula = g + val2 ~ var, value.var = "val")
#>
#> g val2 x y
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 1 NA
#> 2 2 2 NA 2
#>
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
# should fill NA values for val with 0 and val2 with 10 but fills both with 0
df %>%
pivot_wider(names_from = var, values_from = c(val, val2), values_fill = list(val = 0, val2 = 10))
#> Source: local data table [2 x 5]
#> Call: dcast(`_DT1`, formula = g ~ var, value.var = c("val", "val2"),
#> fill = list(val = 0, val2 = 10))
#>
#> g val_x val_y val2_x val2_y
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 0 1 0
#> 2 2 0 2 0 2
#>
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
# does the same thing as above
df %>%
pivot_wider(names_from = var, values_from = c(val, val2), values_fill = 0)
#> Source: local data table [2 x 5]
#> Call: dcast(`_DT1`, formula = g ~ var, value.var = c("val", "val2"),
#> fill = 0)
#>
#> g val_x val_y val2_x val2_y
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 0 1 0
#> 2 2 0 2 0 2
#>
#> # Use as.data.table()/as.data.frame()/as_tibble() to access results
# does the same thing as above
dcast(dt, formula = g ~ var, value.var = c("val", "val2"), fill = 0)
#> Key: <g>
#> g val_x val_y val2_x val2_y
#> <num> <num> <num> <num> <num>
#> 1: 1 1 0 1 0
#> 2: 2 0 2 0 2
Created on 2024-03-10 with reprex v2.1.0
I assume we could add functionality for a list to work like dplyr::pivot_wider()
likely somewhere around here
Can also PR a fix in the documentation for dtplyr
if we aren't going to add this functionality.
Thanks for the investigation! I wondered if that might be what's going on. So, probably {dtplyr} should for now warn if length(values_fill) > 1L
, and actually send values_fill[[1L]]
(hand-waving away the case where the target type is list
where we need to be more careful), since {data.table} only supports one fill=
value for the full output.
That said, offering support for multiple fill=
values seems reasonable enough, though there's not been any request for that behavior thus far. Filed #5990 as an FR to that end.
this seems to be fixed now on the revdep server at least. there was a release of dtplyr recently, but its NEWS does not seem to be relevant.
dtplyr 1.3.1
Fix for failing R CMD check.
dtplyr no longer directly depends on crayon.
did somebody fix this somehow? should we close?
there was a release of dtplyr recently
I guess that was a mis-read, last {dtplyr} release was March 22, 2023
I see a more recent release (late Feb) of {neonPlantEcology}, but also no relevant change I can see at a glance.
I double checked and I still see the error on my laptop, using data.table master, so I guess this is a revdep check server issue, sorry for the noise.
looks like #4586 caused a revdep to start failing checks: