earth-mover / icechunk

Open-source, cloud-native transactional tensor storage engine
https://icechunk.io
Apache License 2.0
291 stars 17 forks source link

git bisect? #377

Open TomNicholas opened 2 weeks ago

TomNicholas commented 2 weeks ago

It would be cool to support some kind of git bisect-like workflow with Icechunk. Allowing the user to narrow down on a commit which changed the data in a certain way that's relevant to them. e.g. imagine you stored regression test data in icechunk, and you want to know which change to the regression test data causes a certain data-dependent test to change behaviour.

paraseba commented 2 weeks ago

This is really cool. Transaction logs are coming soon to Icechunk, and they would help with this and git diff functionality.

paraseba commented 2 weeks ago

Also, for the linked issue, commit squashing and expiration are on the roadmap and will solve the problem. First dependency for that (garbage collection) has been merged already.

TomNicholas commented 2 weeks ago

Awesome! We'll probably have further feedback / ideas once we try out using icechunk for this use case.