Open asfimport opened 2 years ago
Jorge Leitão / @jorgecarleitao: cc @pitrou
Antoine Pitrou / @pitrou: There is actually a discussion to relax the utf8 requirement in IPC metadata values (see the message recently posted by @jorisvandenbossche "Re: [DISCUSS] Binary Values in Key value pairs WAS: Re: [INFO_REQUEST][FLIGHT] - Dynamic schema changes in ArrowFlight streams").
In short: yes, Arrow C++ and PyArrow can put arbitrary binary data in metadata values.
Also cc @lidavidm @emkornfield
Joris Van den Bossche / @jorisvandenbossche:
(Side note: this might be just for quick testing, but if you actually want to use the extension type on the rust side as well, you should probably define the extension type in Python as a subclass of pyarrow.ExtensionType
, and not pyarrow.PyExtensionType
, since the latter uses a pickle dump of the class as the serialized metadata, which you won't be able to use in Rust, I suppose)
While trying to roundtrip an extension from schema.metadata (see ARROW-13855 for details), I got invalid utf8, which imo goes against
Specifically, a field
field = pyarrow.field("aa", UuidType())
contains the following:
with the value's data for this key being:
This is not a valid utf8 (see e.g. https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=02b67658b3cddf8dc095bc9750fa7032).
Maybe I am reading the values incorrectly? (null point?)
[1] https://arrow.apache.org/docs/format/CDataInterface.html#c.ArrowSchema.metadata
Reporter: Jorge Leitão / @jorgecarleitao
Note: This issue was originally created as ARROW-15613. Please see the migration documentation for further details.