Open paraseba opened 1 month ago
@kylebarron We had many random very weird errors with object_store
when used with high concurrency. But this was the last week before launch, so we didn't have time to explore. The errors looked like malformed HTTP responses, very weird. It happened trying to use the dask_write.py
example we have. You can give it a try with several threads doing big writes. Easiest course of action at the time was to switch to aws's implementation. Errors immediately disappeared and performance was awesome again.
I'm personally not a fan of object_store. I think the "one instance per bucket" decision is too committal, and I'm always disappointed by how little compatibility there is between the different implementations (example, metadata support, sorting of keys in list, etc). But I acknowledge the value of having an implementation compatible across platforms.
To support Google and Microsoft, I think using ObjectStore
would be a great first approach, maybe even a definitive one if performance and stability are good.
We had many random very weird errors with object_store when used with high concurrency. But this was the last week before launch, so we didn't have time to explore. The errors looked like malformed HTTP responses, very weird.
@paraseba I believe this could be due to a difference in how credentialization was implemented in the object_store
version of the code compared to AWS SDK. For deployments, object store will fall back to the node profile (credentials provider) when it finds no other form of authentication on the machine. This is a remote network call (PUT request) to an AWS token endpoint that is sent before each HTTP request to the blob store. At high concurrencies you may run into rate limits, account-level quotas, and other failures that cause malformed tokens. I've seen this myself running both boto3
and gdal
at big scales - both have similar fallback logic.
The original object_store
implementation would fall back to node profile if it hit this else
block, and no credentials were configured in the environment - https://github.com/earth-mover/icechunk/blob/e3a119cddfcbe7b38173faeb0b4eb403bd69de7b/icechunk/src/storage/object_store.rs#L92-L94
The AWS SDK does not implement this fall back behavior unless you explicitly configure it in the code, and this was not configured when switching the code over to the AWS SDK (#135). Which would explain why this issue went away with the switch.
This is of course a lot of guesswork based on past experiences, it really depends how the code / dask cluster was deployed and configured. Local cluster on an EC2 instance? Distributed cluster on k8s? Running from your local machine? All have different failure cases when it comes to credentialization.
Either way I don't think this is an issue with object_store
, most likely related to how credentials are configured.
I'm interested in implementing this (and potentially #265 at the same time) via the Python bindings.
Unless you object, I'd like to use pyo3-object-store
for this. I haven't documented it yet, but it's designed as a way to export constructors for the object_store
crate once with a single API, but reuse them across multiple Rust-Python bindings.
pyo3-object_store
to crates.io (now that pyo3-async-runtimes 0.22 has been published)pyo3
0.22 and pyo3-async-runtimes
0.22Any feedback?
Unless you object, I'd like to use pyo3-object-store for this. I haven't documented it yet, but it's designed as a way to export constructors for the object_store crate once with a single API, but reuse them across multiple Rust-Python bindings.
Can you explain why it would make sense to use pyo3-object-store
instead ofusing the object_store
crate itself in rust? I see your point about the single API interface but all of the storage is currently implemented in the rust core, and we will be using this core to support other languages (and platforms) that are not python based and want to make sure we keep that as a core value.
pyo3-object_store doesn't replace the object_store
crate. It provides Python builders so that Python users can create Arc<dyn ObjectStore>
instances.
For example, all of storage.rs
just exports a very small subset of the configuration options to create S3 instances.
I'll publish and document pyo3-object_store
tomorrow, but it essentially allows you to
icechunk.store
, i.e. icechunk.store.S3Store
or icechunk.store.AzureStore
, etc.PyObjectStore
as a parameter in your function exported to Python. PyObjectStore::into_inner
gives you an Arc<dyn ObjectStore>
. This is just for the Python binding. It wouldn't alter any piece of how you already use object-store
in the main icechunk
crate.
For example, all of
storage.rs
just exports a very small subset of the configuration options to create S3 instances.
That is, if you're able to switch back to object-store for S3 as well, you'd be able to delete that entire file by re-exporting the structs from pyo3-object_store
Got it. So most of that makes sense. Trying to understand how the pyobjectstore configs look in Python land, because our biggest reason for using custom structs is that we all dislike the fsspec method of using keyword args for configuration. So using the structs allows us to strictly type the configuration as best we can.
I see that the config is a hashmap with specific keys on the rust side, but on the Python side how do those config bindings appear?
There's strict type hinting for all Python exports. E.g. look at https://github.com/developmentseed/obstore/blob/e1ed7659fdea8a0f4bc9eb2e813f6713c125e368/obstore/python/obstore/store/_aws.pyi. You'd be re-exporting that same class essentially. Or https://github.com/developmentseed/obstore/blob/e1ed7659fdea8a0f4bc9eb2e813f6713c125e368/obstore/python/obstore/store/_azure.pyi for Azure.
Note that the config
parameter is a dict with these specified options; there's also separate options for the HTTP client and for the retry config.
Note that since the underlying Arc<dyn ObjectStore>
is not ABI-stable, you'd need to export these builders from your own Python library. Even though the class names are the same, it wouldn't work to pass in an obstore.store.AzureStore
to icechunk-python
. You'd need to instruct users to construct an icechunk.store.AzureStore
. This is a limitation of dynamic linking with pyo3 for non-ABI-stable objects, see https://github.com/PyO3/pyo3/issues/1444
@geospatial-jeff
The AWS SDK does not implement this fall back behavior unless you explicitly configure it in the code, and this was not configured when switching the code over to the AWS SDK (https://github.com/earth-mover/icechunk/pull/135). Which would explain why this issue went away with the switch.
It seems the AWS sdk does indeed load the config from env though by default (https://github.com/earth-mover/icechunk/blob/2c9a311c4d69c6cff099eb6474b525193d1ec01c/icechunk/src/storage/s3.rs#L85), so i am confused by this statement. Am i missing something here?
I am responding to this inline here because it is related to the effort to support more stores through object store and s3 being compatible.
FromEnv
is probably referring to environment variables, not node instance credentials. Those are two different things.
FromEnv
in the icechunk version does nothing and simply creates an S3 client using the default behavior, which loads from env vars and instance creds etc:
I tried to find it but ObjectStore seems contrived, if we dont explicitly say we to load from env vars will it work the same way?
I published an initial beta of pyo3-object_store
v0.1.0-beta.1: https://crates.io/crates/pyo3-object_store
I'll write more documentation on Monday and then I can lay out my proposal of how the integration would work.
Sounds great!
I'd really like it if we could have a single object_store based Storage
implementation. These are the things I think need to happen for that:
ObjectStore
instances. Last time I checked each instance of ObjectStore
can handle a single bucket (which I hate, but it is what it is)To convince ourselves of point 3 above, we can use our dask_write example. This is how we discovered the performance issues before:
python examples/dask_write.py --url s3://some-bucket/some-prefix create
python examples/dask_write.py --url s3://some-bucket/some-prefix update --t-from 0 --t-to 256 --workers 16
python examples/dask_write.py --url s3://some-bucket/some-prefix verify --t-from 0 --t-to 256 --workers 16
As long as object_store can reproduce (close to) the performance of the current code, we are good.
I personally have quite high confidence in the object_store
crate because it's used in production by some very big projects, including the crates.io registry and any downstream project from datafusion. Some really big companies depend on datafusion, like Apple, which recently donated its datafusion comet project to Apache.
Understandably we need to get more confidence with object_store
in icechunk.
I see the artificially_sort_refs_in_mem
flag. Is it crucial for you to get the list result back in sorted order? I don't recall which stores actually do that. I see that ObjectStore doesn't guarantee the order of results.
- Implement our virtual chunk management with multiple
ObjectStore
instances. Last time I checked each instance ofObjectStore
can handle a single bucket (which I hate, but it is what it is)
This is true, it's connected to a single bucket.
I see the artificially_sort_refs_in_mem flag. Is it crucial for you to get the list result back in sorted order?
Yes that is crucial. Our in mem store is used for tests and exploration, so we don't expect to have many keys, the artificial sort works there. And with the "normal" object stores, object_store
preserves the ordering. So I'm not too concerned about this point.
Another similar one is some object_store
implementation don't handle metadata, which increases the complexity of the code a lot. You'll find many instances of if self.supports_metadata
.
Understandably we need to get more confidence with object_store in icechunk.
Yeah .. Icechunk use of the object store could be somewhat particular, we want to read and write many relatively small objects, with very high concurrency and quickly.
Another similar one is some
object_store
implementation don't handle metadata
Can you elaborate on this? You're referring to these attributes in the GetResult
?
You'll find many instances of
if self.supports_metadata
.
I can only find two instances of the string supports_metadata
on current main
https://github.com/earth-mover/icechunk/blob/fc62650c662b1d1dc9a924571cc45b6652f33758/icechunk/src/storage/object_store.rs#L211-L230
And from the code it looks like AWS, GCS, and Azure all support writing attributes. Or maybe it was with local/in-memory stores that you had issues?
Right, the local store is the one not supporting user metadata, you'll see we create it with:
supports_metadata: false,
Metadata could become a more important need in the near future, and we don't have an answer for local filesystem support. Thankfully we can still use the in memory store in tests, but it will make things more complicated. For example, to ensure we maintain format compatibility across versions, it would be great to have a local store that supports metadata.
By the way, thank you so much for working on this @kylebarron ! If we manage to make it work it's going to be a big improvement to Icechunk
For both this and https://github.com/earth-mover/icechunk/issues/265, it seems like an
ObjectStore
-based API is ideal, correct?In https://github.com/developmentseed/object-store-rs I've been developing pyo3 integration for the
object_store
crate, that we could reuse in the icechunk Python APIs. (That repo has both a user-facing Python API and a Rust-facing pyo3 integration (not yet well documented)).I also heard vaguely from @jhamman that you originally used the
object_store
crate for all backends, but then hit some performance issues that made you switch to usings3
directly. Can you elaborate on that? Theobject_store
maintainers are very performance sensitive so I'd 1) be surprised that you could get better performance from using the s3 APIs natively and 2) if so theobject_store
maintainers would probably be interested in your use case. (If preferred, we can move this discussion to a separate thread).