Open dataPulverizer opened 9 years ago
I just downloaded data.table from GitHub. I can confirm that this is still an issue. Here is my session info.
thocking@silene:~/R$ R --vanilla
R version 3.3.2 (2016-10-31) -- "Sincere Pumpkin Patch"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(survival)
> library(data.table)
data.table 1.10.5 IN DEVELOPMENT built 2017-02-16 18:03:14 UTC
The fastest way to learn (by data.table authors): https://www.datacamp.com/courses/data-analysis-the-data-table-way
Documentation: ?data.table, example(data.table) and browseVignettes("data.table")
Release notes, videos and slides: http://r-datatable.com
> data.table(surv=Surv(1, 5, type="interval2"))
Error in `[.Surv`(x, , 3) :
invalid to set the class to matrix unless the dimension attribute is of length 2 (was 0)
> devtools::session_info()
Session info -------------------------------------------------------------------
setting value
version R version 3.3.2 (2016-10-31)
system x86_64, linux-gnu
ui X11
language en_CA:en
collate en_CA.UTF-8
tz <NA>
date 2017-02-16
Packages -----------------------------------------------------------------------
package * version date source
data.table * 1.10.5 2017-02-16 Github (Rdatatable/data.table@9fadbcd)
devtools 1.12.0.9000 2016-08-12 Github (hadley/devtools@565ac15)
digest 0.6.10 2016-08-02 CRAN (R 3.2.3)
lattice 0.20-34 2016-09-06 CRAN (R 3.3.2)
Matrix 1.2-7.1 2016-09-01 CRAN (R 3.3.2)
memoise 1.0.0 2016-01-29 CRAN (R 3.2.3)
survival * 2.40-1 2016-10-30 CRAN (R 3.3.2)
withr 1.0.2 2016-06-20 CRAN (R 3.2.3)
>
Thanks for all the work on data.table -- the package has been indispensable.
However, this issue forces me transition out of a data.table whenever survival data is involved. Any possible updates on a fix? Thanks!
# with data.table 1.15.0
> dt = data.table(survival::myeloma)
> dt[, surv := survival::Surv(futime, death)]
Error in `[.data.table`(dt, , `:=`(surv, survival::Surv(futime, death))) :
Supplied 7764 items to be assigned to 3882 items of column 'surv'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
In addition: Warning message:
In `[.data.table`(dt, , `:=`(surv, survival::Surv(futime, death))) :
2 column matrix RHS of := will be treated as one vector
confirming this is still an issue on master, and actually I get a new error (stack overflow) from when running my old example with type="interval2"
on windows:
> library(data.table)
data.table 1.15.99 IN DEVELOPMENT built 2024-08-07 15:42:29 UTC using 3 threads (see ?getDTthreads). Latest news: r-datatable.com
> data.table(surv=Surv(1, 5, type="interval2"))
Error in as.data.frame.model.matrix(x, ...) : node stack overflow
> dt = data.table(survival::myeloma)
> dt[, surv := survival::Surv(futime, death)]
Error in `[.data.table`(dt, , `:=`(surv, survival::Surv(futime, death))) :
Supplied 7764 items to be assigned to 3882 items of column 'surv'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
In addition: Warning message:
In `[.data.table`(dt, , `:=`(surv, survival::Surv(futime, death))) :
2 column matrix RHS of := will be treated as one vector
the underlying issue is that a Surv object is either a Nx3 numeric array
> str(Surv(1, 5, type="interval2"))
'Surv' num [1, 1:3] [1, 5]
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:3] "time1" "time2" "status"
- attr(*, "type")= chr "interval"
or a Nx2 numeric array
> str(Surv(1:2, 0:1))
'Surv' num [1:2, 1:2] 1+ 2
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:2] "time" "status"
- attr(*, "type")= chr "right"
this issue is tagged "non-atomic column" but actually Surv is atomic:
> is.atomic(Surv(1, 5, type="interval2"))
[1] TRUE
it just has a custom length
method:
> survival:::length.Surv
function (x)
nrow(x)
<bytecode: 0xb3211b8>
<environment: namespace:survival>
> length(Surv(1, 5, type="interval2"))
[1] 1
> length(as.matrix(Surv(1, 5, type="interval2")))
[1] 3
data.table sees matrix length=3, not Surv length=1, which I believe is a bug, but it is not clear to me if there should be support for this, from the ?data.table docs. The only mention I see of length is:
...: Just as ‘...’ in data.frame. Usual recycling rules are
applied to vectors of different lengths to create a list of
equal length vectors.
Thanks for looking back into it!
my stack overflow turned into a "too close to the limit" error on linux, which looks like this
> data.table(x=1:2,y=Surv(5:6,7:8,type="interval2"))
Erreur : C stack usage 9521904 is too close to the limit
> traceback()
1621: mode(expr)
1620: mode(expr) %in% c("call", "expression", "(", "function")
1619: deparse(substitute(x))
1618: as.data.frame.model.matrix(x, ...)
1617: as.data.frame.Surv(x, ...)
1616: as.data.frame(x, ...)
1615: as.data.table(as.data.frame(x, ...), ...)
1614: as.data.table.default(xi, keep.rownames = keep.rownames)
1613: as.data.table(xi, keep.rownames = keep.rownames)
...
8: as.data.table(xi, keep.rownames = keep.rownames)
7: as.data.table.list(as.list(x), keep.rownames = keep.rownames,
...)
6: as.data.table.data.frame(as.data.frame(x, ...), ...)
5: as.data.table(as.data.frame(x, ...), ...)
4: as.data.table.default(xi, keep.rownames = keep.rownames)
3: as.data.table(xi, keep.rownames = keep.rownames)
2: as.data.table.list(x, keep.rownames = keep.rownames, check.names = check.names,
.named = nd$.named)
1: data.table(x = 1:2, y = Surv(5:6, 7:8, type = "interval2"))
this is an inefficient work-around/hack, but you can use a list column with the current code (each list element is Surv with one observation).
> (myeloma_dt <- data.table(myeloma)[, list_of_Surv := split(Surv(futime,death),.I)][])
id year entry futime death list_of_Surv
<int> <int> <int> <int> <int> <list>
1: 1 57 0 1431 1 1431
2: 2 61 0 686 1 686
3: 3 53 0 6270 1 6270
4: 4 66 0 365 1 365
5: 5 67 0 1340 1 1340
---
3878: 3910 95 40 42 0 42+
3879: 3911 95 347 348 0 348+
3880: 3912 96 28 31 0 31+
3881: 3913 95 221 223 0 223+
3882: 3914 94 497 498 0 498+
Then you would have to use do.call
with c
to get a regular Surv
back, as below.
> str(do.call(c, myeloma_dt$list_of_Surv))
'Surv' num [1:3882, 1:2] 1431 686 6270 365 1340 1567 3797 5 121 822 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:2] "time" "status"
- attr(*, "type")= chr "right"
> myeloma_dt[c(1,2,3880), do.call(c, list_of_Surv)]
[1] 1431 686 31+
data.table supports other kinds of custom columns (bit64, nanotime, xts) so it seems like in principle Surv could be.
Thanks -- I think the workaround will work for me for now! I'll have to digest what it's doing a bit... Thanks again for the help/advice.
Thank you