coolbutuseless / zstdlite

Fast, configurable in-memory compression of R objects with zstd
Other
26 stars 0 forks source link

Expose (un)serialize(..., refhook)? #2

Open HenrikBengtsson opened 3 years ago

HenrikBengtsson commented 3 years ago

Since you're using R's serialization framework internally, would it be possible to expose the reference hook feature that you get with serialize()/unserialize()?

coolbutuseless commented 3 years ago

I'm using R's serialization framework from the C side.

I'll have a look into how a refhook function may be specified there.

coolbutuseless commented 3 years ago

Just keeping a note on how a refhook example works in a call to serialize from R

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Function for custom handling of reference objects during serialization
#  e.g. external ptrs, weak refs, environments
#
# @param x object
# @return character string
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
refhook_serialize <- function(x) {
  "dummy value"
}

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Function for custom handling of reference objects during un-serialization
#  e.g. external ptrs, weak refs, environments
#
# @param chr character string produced by 'refhook_serialize()'
#
# @return R object
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
refhook_unserialize <- function(chr) {
  list(dummy = 0L)
}

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Test data
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
env <- as.environment(list(x = 1, y = 2))
obj <- list(
  a = 100,
  e = env
)

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# With no refhook environment is handled automatically by R
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ser <- serialize(obj, NULL)
unserialize(ser)
#> $a
#> [1] 100
#> 
#> $e
#> <environment: 0x7f8ee4230358>

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# With refhook specified, environment is encoded different, and then 
# decoded by whatever is specified in 'refhook_unserialize()'
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ser <- serialize(obj, NULL, refhook = refhook_serialize)
unserialize(ser, refhook = refhook_unserialize)
#> $a
#> [1] 100
#> 
#> $e
#> $e$dummy
#> [1] 0

Created on 2020-12-18 by the reprex package (v0.3.0)

coolbutuseless commented 3 years ago

C code for how these refhook functions are applied in C taken from R source code serialize.c

stream->OutPersistHookFunc(s, stream->OutPersistHookData);
stream->InPersistHookFunc(s, stream->InPersistHookData);
   A mechanism is provided to allow special handling of non-system
reference objects (all weak references and external pointers, and
all environments other than package environments, namespace
environments, and the global environment).  The hook function
consists of a function pointer and a data value.  The serialization
function pointer is called with the reference object and the data
value as arguments.  It should return R_NilValue for standard
handling and an STRSXP for special handling.  If an STRSXP is
returned, then a special handing mark is written followed by the
strings in the STRSXP (attributes are ignored).  On unserializing,
any specially marked entry causes a call to the hook function with
the reconstructed STRSXP and data value as arguments.  This should
return the value to use for the reference object.  A reasonable
convention on how to use this mechanism is neded, but again the
format should be compatible with any reasonable convention.
coolbutuseless commented 3 years ago

This looks a bit tricky. I think the following is needed:

@HenrikBengtsson do you have a motivating example for when you need this refhook functionality?