apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.59k stars 786 forks source link

Can object_store on 32-bit systems read ranges in 4GB+ files? (Should we use `u64` vs `usize`) #5351

Open CarlKCarlK opened 9 months ago

CarlKCarlK commented 9 months ago

[First, sorry for the flurry of issues and thank you for your responsiveness. Second, I apologize that this issue will be vague and without a repro case.]

Rust's file seek for local files uses u64, not usize. This allows even 32-bit OS to access regions of files beyond 4GB.

object_store's get_range and many related methods use usize. This works fine on a 64-bit OS, but on a 32-bit OS (including WASM32) using HTTP I think limits one to the first 4GB of any file.

Possible fixes:

Thanks, Carl

tustvold commented 9 months ago

I think changing to use u64 would make sense as part of a broader initiative to support wasm32. However, given the crate currently doesn't support anything other than in memory for wasm32, I think it would be a pretty tough sell given the downstream impact of such a change. Not to mention quite hard to test.

flokli commented 2 months ago

I was running into this today as well. The axum-range crate returns a Range that's using u64 as ranges, and due to get_ranges in object_store only accepting usize, the conversion is a bit more ugly than necessary.