Closed onetonfoot closed 6 years ago
Hm, well this is embarassing. It looks like we are indeed handling beyond the first 8 bits of UTF-8 incorrectly. This will require a fix to Arrow.jl, I'll try to do it today.
This should be fixed by Arrow v0.2.2 so please try again once the Arrow tag has merged to METADATA.
Sadly the feather files you wrote before the Arrow v0.2.2 patch are corrupted (as I'm sure you probably already knew), sorry about that.
No worries, thanks for the quick fix.
I've got a data frame with some strings that contain Chinese characters and possibly other non ascii stuff. When I write this to feather and then read it back, it causes the string to become mangled. An example of 4 strings.
After Reading back
The problem seems to go away if I filter the text for only ascii chars.