apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.62k stars 802 forks source link

Support Duration in JSON Reader #6683

Closed simonvandel closed 5 days ago

simonvandel commented 2 weeks ago

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Add support for deserializing durations.

Are there any user-facing changes?

Support for Duration in JSON deserialization.

tustvold commented 2 weeks ago

AFAICT this expects durations to be encoded as integers, this isn't necessarily wrong, but I'm also not sure this is what people would necessarily expect? When formatting durations for display we currently use a string based representation.

Perhaps it might be worth filing an issue to discuss what representation is expected for duration types, e.g. with reference to other arrow implementations, and then tracking adding support to both the read and write sides?

simonvandel commented 2 weeks ago

Yes, the current PR only supports durations encoded as integers or strings parsed as integers (it just uses the existing decoders for i64)

I think it's a good idea to explore if we should support other duration representations.

Do you think the current PR can be merged as-is? We can then create a follow-up issue to discuss the other representations.

tustvold commented 2 weeks ago

Do you think the current PR can be merged as-is

I don't think we should merge this if it isn't consistent with other implementations

simonvandel commented 2 weeks ago

I tried pyarrow here: https://gist.github.com/simonvandel/ff420a412cc9a34d95a23086ecec3b15 using uv run python hello.py

It seems like duration is not supported:

pyarrow.lib.ArrowNotImplementedError: JSON conversion to duration[s] is not supported
alamb commented 1 week ago

Do you think the current PR can be merged as-is? We can then create a follow-up issue to discuss the other representations.

I recommend we at least have the property that data written by the JSON encoder can be read by the JSON encoder

So in other words, if we encode a Duration as JSON can it then be read back as the same Duration? As long as the answer is yes this seems like a reasonable change to me.

Thank you @simonvandel and @tustvold