earth-mover / icechunk

Open-source, cloud-native transactional tensor storage engine
https://icechunk.io
Apache License 2.0
261 stars 14 forks source link

Don't allow to write chunks to invalid chunk coordinates #175

Open paraseba opened 4 weeks ago

paraseba commented 4 weeks ago

@jhamman do you think implementing this could cause any kind of problem with zarr-python? For example, this would demand that an array (metadata) is resized before it's new chunks are set.

emacollins commented 1 day ago

@paraseba Would like to try and give this one a shot! Let me know any context and background.

paraseba commented 1 day ago

@emacollins Great!

So, the key to this one is in the set_chunk_ref method of the Repository impl. In there we get the Array and apply the requested change. After getting the array, but before applying the change we would need to check if the passed coord: ChunkIndices are valid for the array. If they are not, we should fail.

To know if the coordinates are valid, we need to look into the array metadata, which you can find in the response of Repository.get_array.

You'll probably need to add a new error case to RepositoryError.

Probably some of our tests fail after this change, and also, we'll want to add tests that verify the new behavior.

Let me know if that's enough to get you started or if you want more context. And ping me with any questions.

Thank you!

emacollins commented 14 hours ago

@paraseba Thanks for the detailed explanation! I think this is enough to get started. I'll familiarize myself about with each of these parts and see what I can do

paraseba commented 14 hours ago

Sounds great @emacollins ! Take you time and let me know if you have questions.