Closed jangorecki closed 3 years ago
If this is just about integer64, then I get that. The title could be about integer64, and memrecycle could be improved to handle no-alloc zero-copy integer64 coercion (from a comment by Michael, it sounds like it doesn't right now) as it already does for int-num and num-int. But general class coercion, to me, refers to as.<class>
; i.e. the R level method which would involve calling eval()
. Is that what you mean by class-aware?
for double type, different classes uses different NA representation so we need to handle that
you've used plural here, but it's just one class that does that: integer64. Enhancing memrecycle to do no-alloc integer64 coercion would be much easier for me to understand and merge.
it is important because coercion is a very common operation, especially num-int or int-num for user convenience so they don't have to type L suffix, but there are tons of other applications.
This line could maybe be removed now, since "No-alloc coerce written for C assign function #3909 is good foundation, that potentially could be extracted out to an standalone function." is at the end. Or could you provide one example from the tons of applications (that isn't integer64) to help me understand.
Extracting the no-alloc coercion parts of memrecycle into a separate internal function makes sense to me. That would be much easier to merge a PR that did that. However, have you seen the internals of memrecycle with the big macro I did? The idea was that the no-alloc coerce was only possible for the very reason it is part of memrecycle. To take the no-alloc part of it out would mean returning the coerced type which means allocating the coerced type. IIUC.
If this is just about integer64, then I get that.
Yes it is about integer64 (or nanotime). Eventually other classes if we will decide to handle them here.
general class coercion, to me, refers to as.
; i.e. the R level method which would involve calling eval(). Is that what you mean by class-aware?
Yes, we don't want to go via R's eval for coercion. That was already solved by coerceFill
before but it was not very efficient because it was always coercing input to int, double and int64. Then nafill
(which called coerceFill
) was using only the type it needed. There is a need to use this kind of coercion in multiple place so I filled this issue.
memrecycle could be improved to handle no-alloc zero-copy integer64 coercion (from a comment by Michael, it sounds like it doesn't right now) as it already does for int-num and num-int.
Unless I am missing something, looking at the following lines it seems that it does handle zero-copy assignment: https://github.com/Rdatatable/data.table/pull/4491/files#diff-22b103646a1efab9bbfc374791ccfc3fd1422eefc48918a3e126fc2f30d1f572R857
Extracting the no-alloc coercion parts of memrecycle into a separate internal function makes sense to me.
To me no-alloc coercion is a secondary feature. The more important is a helper to deal with int64 <-> num / int
. Separate internal function makes perfect sense, and that was proposed in #3765, but it ended up duplicating a lot of functionality that was in memrecycle
, where it was written much better. So this new PR re-used memrecycle
instead. coerceAs
allocates anyway so it is not designed to be called from parallel region, so no-alloc is not really crucial for coerceAs
.
We should have a function for coercing an object to another type and class, in a efficient way, and class aware (for double type, different classes uses different NA representation so we need to handle that). It is important because coercion is a very common operation, especially
num-int
orint-num
for user convenience so they don't have to typeL
suffix, but there are tons of other applications.Initial work on that can be found in https://github.com/Rdatatable/data.table/pull/3765 (generic coerceClass)
No-alloc coerce written for C assign function https://github.com/Rdatatable/data.table/pull/3909 is good foundation, that potentially could be extracted out to an standalone function.