Open mattdowle opened 5 years ago
Agree the as.data.table
approach is a bit of a have and this is right way to do things.
How does one point to matrix column names in C API?
Don't think I'll have time before next year to work on something that for me & my lack of C experience will end up being quite time consuming (only mentioning because I see the 1.12.0 milestone and know you mentioned maybe pushing out soon given the memory fixes)
Sure thing.
args.columns[j] = (char *)DATAPTR(matrix) + j*size*nrow;
where size
is 4 for int and 8 for double, for example.
instead of DATAPTR
isn't it more appropriate to use switch(TYPEOF(matrix)) ... INTEGER(matrix) ... REAL(matrix)
?
It's more convenient and appropriate to use DATAPTR. But we're not supposed to is more the point because it's not part of the R API. I wish it were and I don't see why it isn't. It should be the switch() way to be compliant, yes. We could create our own dataptr() function that does the switch. Seems silly to have to do so though when DATAPTR could just be added to the R API. There must be a good reason why it isnt. Maybe now that INTEGER, REAL and DATAPTR are no longer ever macros, even under USE_RINTERNALS, the original reason for not exposing DATAPTR in R API might have gone away.
I'd like to draw attention to the fact that matrices can have rownames which in the current version are stripped away in the as.data.table
conversion. May I please ask that rownames be written as the first column by default if it is sensible? If not, maybe an additional argument to include rownames, like as.data.table
?
Re rownames: sure thing. If rownames are the default 1:nrow then can we assume please they aren't really rownames and not write them by default? Internally R uses c(NA,INT_MIN) to represent default row names efficiently, so it's safe and efficient to test that attribute to see if row.names is a length 2 integer vector holding those values. With an argument to explicitly specify row names too of course; we're just talking about the default here.
Thanks Matt!
PR #3125 implemented #2613 to enable to
fwrite
to acceptmatrix
. This was merged in v1.12.0. However it converts matrix to data.table which can be costly. The convert can be avoided as follows.fwriteR.c
contains this :This loop populates the pointers in
arg.columns[]
which are then passed tofwrite.c
. That DATAPTR returns a pointer to where the data (e.g.int *
,double *
) for the R vector starts. In the case of matrix, thisargs.columns[]
just needs to be populated with offsets into matrix vector. In R a matrix is a single very long vector just with dimension attribute attached. A matrix in R is columnar, just like a data.table, so there doesn't need to be a transpose. It should be fairly simple to do and work well.Careful to add tests for a factor matrix as well as a character matrix.
I described how to do it in case @fparages or @MichaelChirico wanted to give it a go.