Open SteveBronder opened 4 months ago
At root the issue is we don't do length()
S3 dispatch.
Apparently (#1433) we wrote our own dim
in the first place for performance reasons with [[.data.frame
.
Maybe we should implement [[.data.table
instead to just wrap VECTOR_ELT(x, i-1)
, then length()
can dispatch correctly. Otherwise we need to do length()
dispatch ourselves from C, right?
I have a PR at the link below to do the length dispatch in C. It's been a long time since I've written with R's C API so it may not be that nice. I also ran into another bug where rownames(toprint)
was only giving back a length 1 vector so line 80 of print.data.table
was failing
if (isTRUE(row.names)) {
# rownames was NULL and the rhs was length 4 so this threw an error
rownames(toprint) = paste0(format(rn,right=TRUE,scientific=FALSE),":")
} else {
rownames(toprint) = rep.int("", nrow(toprint))
}
Since vectors of rvar types have a dimension format_col.default
was choosing the first branch and just returning "<multi-column>"
. imo I think that if
condition should be kicked to the second to the last of the if else statements so that package writers have an opportunity to write the proper overwrite for format
Thanks for the reminder. Indeed I'd checked if rvar is S4, but it's just S3 which I thought should be easier to accommodate in general. And indeed I wouldn't expect generic computations on rvar columns in data.table to work.
But storing non-atomic columns in a data.table is still very useful for some high-level use cases, and something as basic as print()
not working is surprising to me. If we can get a fix working that's fairly general and not terribly involved/inefficient, I think we should go for it.
Thanks for the reminder. Indeed I'd checked if rvar is S4, but it's just S3 which I thought should be easier to accommodate in general. And indeed I wouldn't expect generic computations on rvar columns in data.table to work.
Yes I just was messing with rbindlist.c
and I think this is a bigger project than I want to take on atm.
But storing non-atomic columns in a data.table is still very useful for some high-level use cases, and something as basic as print() not working is surprising to me. If we can get a fix working that's fairly general and not terribly involved/inefficient, I think we should go for it.
Would you like me to open up a PR for the fix branch I posted above?
Would you like me to open up a PR for the fix branch I posted above?
that'd be great!
Summary
When printing a
data.table
that contains anrvar
type from theposterior
package an emptydata.table
is printed instead. See the mrp below for a full case. Looking into it more it appearsdim.data.table
returns a value of zero for the rows where there should be a nonzero value.dim
returning a zero causes line 55 ofprint.data.table
to go to the emptydata.table
message.I tried looking into the code for
dim
:https://github.com/Rdatatable/data.table/blob/46ee52571214b135b645e367ebacd02de02aff52/src/wrappers.c#L101-L121
The
data.table
has alength(x)
equal to 1 so the last branch is chosen.Calling
dim.data.frame
with the data.table gives the correct output below of (4, 1). I think this is because.row_names_info
used indim.data.frame
is just asking for the length of the row names via.Internal(shortRowNames(x, type))
.Minimal reproducible example
Looking over the open/closed issues for data.table I could not find anything similar. I don't know enough about R's internals to know why
length(VECTOR_ELT(x, 0))
is giving a value of zero, though I found the source forVECTOR_ELT
here#
Output of sessionInfo()