RConsortium / marshalling-wg

ISC Working Group 'Marshaling and Serialization in R' (anno May 2024)
15 stars 2 forks source link

Investigating serialisation refhooks #5

Open shikokuchuo opened 1 month ago

shikokuchuo commented 1 month ago

I have an instrumented version of nanonext available at shikokuchuo/nanonext@instrumented which just prints to the console each time it executes a hook.

This should hopefully make things easier to reason with.

@sebffischer

shikokuchuo commented 1 month ago

From quick initial tests with torch tensors, a list of identical tensors runs the inhooks once (as designed) and is hence efficient in terms of serialisation. However the outhooks run each time, hence I think why identical fails after unserialization.

I've not yet tested with Arrow or Polars objects as I'm at the airport about to board a flight. Those will be better examples to use because of the special nature of torch serialisation.

shikokuchuo commented 1 month ago

Can confirm same for Arrow and Polars objects - using minimally modified examples from the mirai vignette: https://shikokuchuo.net/mirai/articles/mirai.html#serialization-arrow-polars-and-beyond

Output should be like the below:

m <- mirai(list(a = x, b = x), x = x)
Inhook 1
> m[]
Outhook 1
Outhook 2
# < ...object output omitted... >

compared to where y is a distinct object:

m <- mirai(list(a = x, b = y), x = x, y = y)
Inhook 1
Inhook 2
> m[]
Outhook 1
Outhook 2
# < ...object output omitted... >

Was chatting to @lionel- earlier about this. Apparently copy-on-write semantics does not survive a round-trip through serialization / unserialization in any case. The fact that only one copy of the object is serialized is already an improvement over non-reference objects:

> x <- 1
> .Internal(inspect(x))
@57e435a8b400 14 REALSXP g1c1 [MARK,REF(12)] (len=1, tl=0) 1
> .Internal(inspect(list(x)))
@57e43809ee78 19 VECSXP g0c1 [] (len=1, tl=0)
  @57e435a8b400 14 REALSXP g1c1 [MARK,REF(13)] (len=1, tl=0) 1
> .Internal(inspect(list(x, x)))
@57e437d55858 19 VECSXP g0c2 [] (len=2, tl=0)
  @57e435a8b400 14 REALSXP g1c1 [MARK,REF(15)] (len=1, tl=0) 1
  @57e435a8b400 14 REALSXP g1c1 [MARK,REF(15)] (len=1, tl=0) 1
> .Internal(inspect(list(x, x, x)))
@57e438975f98 19 VECSXP g0c3 [] (len=3, tl=0)
  @57e435a8b400 14 REALSXP g1c1 [MARK,REF(18)] (len=1, tl=0) 1
  @57e435a8b400 14 REALSXP g1c1 [MARK,REF(18)] (len=1, tl=0) 1
  @57e435a8b400 14 REALSXP g1c1 [MARK,REF(18)] (len=1, tl=0) 1

> serialize(list(x), NULL)
 [1] 58 0a 00 00 00 03 00 04 04 01 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00 00 13 00 00 00 01 00 00 00 0e 00 00 00 01
[40] 3f f0 00 00 00 00 00 00
> serialize(list(x, x), NULL)
 [1] 58 0a 00 00 00 03 00 04 04 01 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00 00 13 00 00 00 02 00 00 00 0e 00 00 00 01
[40] 3f f0 00 00 00 00 00 00 00 00 00 0e 00 00 00 01 3f f0 00 00 00 00 00 00
> serialize(list(x, x, x), NULL)
 [1] 58 0a 00 00 00 03 00 04 04 01 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00 00 13 00 00 00 03 00 00 00 0e 00 00 00 01
[40] 3f f0 00 00 00 00 00 00 00 00 00 0e 00 00 00 01 3f f0 00 00 00 00 00 00 00 00 00 0e 00 00 00 01 3f f0 00 00 00 00 00
[79] 00