H-Plus-Time / object-store-wasm

Apache License 2.0
2 stars 1 forks source link

Bug/Feature: Suffix requests #41

Open H-Plus-Time opened 3 months ago

H-Plus-Time commented 3 months ago

Overall this is fairly straightforward - replicate the set of fixes, additional structs (ContentRange) and some of the error-reporting states.

There is, however, one important deviation when running inside a browser: 'Content-Range' is not a CORS-safelisted property :grimacing: : https://developer.mozilla.org/en-US/docs/Glossary/CORS-safelisted_response_header .

The upstream behaviour treats this as an error (rightly, given the HTTP spec mandates support for the Range header -> support for the Content-Range header, and CORS is inapplicable outside the browser) - in the browser case, this is probably a no-go, given the paucity of sites that include it in Access-Control-Expose-Headers.

Since the meta.size field is not an Option type, it must be set to something - the only problem with that is the natural fallback value (what object store used to do, namely use the Content-Length field) is likely to be wildly inaccurate.

H-Plus-Time commented 3 months ago

An additional ClientOptions, exposed on the object store itself and the Wasm wrapper, is needed here.

Actually determining at runtime whether a HEAD is required (only in situations where the size of the file actually matters (cases where it doesn't matter: formats like parquet essentially describe themselves once you have the last 8 bytes))... that can be somewhat tricky if all you have is a range and a size that might be the size of the file or just the range.

This is one area where this repo will have to diverge from upstream (probably permanently) - each of the stores authored here will need an extra method (get_range_conditional_head, maybe get_ranges_conditional_head) that does:

kylebarron commented 3 months ago

Since the meta.size field is not an Option type, it must be set to something

With respect to Parquet specifically, I think the ParquetObjectReader shouldn't require an ObjectMeta in its constructor (or it should be optional). This is something I want to fix in https://github.com/apache/arrow-rs/issues/5979#issuecomment-2234204999 but I haven't gotten around to it yet. Happy to collaborate on that if you want.

H-Plus-Time commented 3 months ago

Yeah, I was thinking along the same lines, that the main consumer (parquet-wasm, geoarrow-wasm) will likely be (transitively) refactored at the earliest by ~September (I agree, it really does seem to only be relevant for the FileMetadata extraction), thus removing most of the motivation for this.

Potentially worthwhile still implementing (I'm sure there's a bunch of other formats (probably zip with its' EOCD) that behave similarly), though actually benefiting from it does require controlling the CORS configuration of whatever you're requesting.

kylebarron commented 3 months ago

Well arrow 52.2.0 was just released, so we have about ~1 month to get this into parquet, or else it will sit till December