earth-mover / icechunk

Open-source, cloud-native transactional tensor storage engine
https://icechunk.io
Apache License 2.0
305 stars 18 forks source link

Rust API: `Store::list_dir` does not differentiate between keys/prefixes #276

Closed LDeakin closed 1 month ago

LDeakin commented 1 month ago

For example, I would expect the following to output "c/" and "zarr.json".

https://github.com/earth-mover/icechunk/blob/d2cced7ef33e69291af506fcb6ca2165655b7d0c/icechunk/src/zarr.rs#L2073-L2075

Also, consider outputting a struct holding keys/prefixes in separate Vecs instead.

paraseba commented 1 month ago

@LDeakin we are a bit limited by what other implementations of the Zarr store do. The current behavior reproduces what RemoteStore does today in zarr-python. In general it's not clear from the zarr spec what these stores should return for listing. Also the list methods don't provide a great API, forcing the stores to return always both metadata and chunk keys.

What are you trying to do? Can you use a higher level API like Repository?

LDeakin commented 1 month ago

In general it's not clear from the zarr spec what these stores should return for listing:

Relevant part of the spec for list_dir:

Output: set of keys and set of prefixes
For example, if a store contains the keys “a/b”, “a/c”, “a/d/e”, “a/f/g”, then list_dir("a/") would return keys “a/b” and “a/c” and prefixes “a/d/” and “a/f/”

The current behavior reproduces what RemoteStore does today in zarr-python

Fair enough. I am seeing the same thing with LocalStore too. I'd argue that zarr-python and icechunk are non-conformant, but the abstract store API section is not actually intended to be normative. So who cares? But in my view, list_dir is far less usable if it does not differentiate keys and prefixes (as the spec intended).

What are you trying to do?

Attempting to wrap the icechunk::Store for zarrs.

Can you use a higher level API like Repository?

Perhaps, but the API for icechunk::Store looked most similar to the zarrs storage traits.

paraseba commented 1 month ago

Makes sense @LDeakin . Would it help if we add a new method that has the behavior you want? Maybe we can wrap each individual result in an enum that differentiates between keys and prefixes. I wouldn't want to modify list_dir to maintain compatibility with what other Stores are doing.

paraseba commented 1 month ago

@LDeakin: How about https://github.com/earth-mover/icechunk/pull/286 ?