crowding / msgpack-r

Fast reading and writing of Msgpack data in R msgpack.org[R]
Other
14 stars 0 forks source link

having trouble encode/decode with special chars #2

Open surfingkaka opened 5 years ago

surfingkaka commented 5 years ago

Please see reproducible example below. Maybe i am doing something wrong

library(msgpack) data = data.frame(a=seq(1,5), b=c("a","b","#","$","^")) pdf = packMsg(prepack(data)) unpackMsg(pdf) a b 1 1 4 2 2 5 3 3 1 4 4 3 5 5 2

Using R serialize/unserialize

sdf = serialize(data, NULL) unserialize(sdf) a b 1 1 a 2 2 b 3 3 # 4 4 $ 5 5 ^

crowding commented 5 years ago
  1. Prepack will be called automatically by packMsg, so there's no need to here.

I think the problem is with factors -- if I add stringsAsFactors = FALSE it works:

> library(msgpack)
> data <- data.frame(a=seq(1,5), b=c("a","b","#","$","^"), stringsAsFactors=FALSE)
> pdf <- packMsg(data)
> unpackMsg(pdf)
  a b
1 1 a
2 2 b
3 3 #
4 4 $
5 5 ^

Messagepack doesn't have a clear analogue of R attributes like factor levels, so they get dropped in translation. I'll think about whether factors should be translated to characters instead.

surfingkaka commented 5 years ago

Sorry, I missed this reply earlier. Thank you. Factors themselves are integer vectors with a character associated with each integer. Maybe there is something that can be done with adding a separate map with integer to character mapping across the data.frame.

This is pretty fast library, I assume from this reply that you are actively supporting this.