apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.62k stars 802 forks source link

Length-related Functions Should Return Unsigned Integer #6708

Closed austin362667 closed 1 week ago

austin362667 commented 1 week ago

Describe the bug

The string array functions length() and bit_length() should not return length as signed integer.

To Reproduce

See https://github.com/apache/arrow-rs/blob/5ad621fd97032796f3da2d02948e65513de8c891/arrow-string/src/length.rs#L47 and https://github.com/apache/arrow-rs/blob/5ad621fd97032796f3da2d02948e65513de8c891/arrow-string/src/length.rs#L111

Expected behavior

Should return unsigned integer, since there are no negative length in our context.

Additional context

Thanks for @alamb bringing this up in the discussion on https://github.com/apache/arrow-rs/pull/6671#discussion_r1826570314

Note: The actual Arrow spec uses integer types.

tustvold commented 1 week ago

I think we probably want to get some broader thoughts on this, whilst I agree unsigned is likely more appropriate in the Rust ecosystem, lots of systems, arrow included, prefer signed quantities for compatibility reasons (see StringArray, ListArray, etc...)

Either way, this isn't a bug and would be a breaking change

alamb commented 1 week ago

Specifically the Arrow spec uses i32/i64 typically for lengths for compatibility with Java (which does not support unsigned integers 🤷 )

austin362667 commented 1 week ago

So I think maybe we can close this issue?