Open paraseba opened 3 weeks ago
I'll add another option to "How to surface":
That could help us in the future have an answer to 3.
Example:
struct Status(Vec<(Path, Vec[Change])>)
enum Change {
ArrayCreated(ZarrMetadata),
ArrayMetadataUpdated(ZarrMetadata),
ChunksDeleted(Vec<ChunkIndices>),
ChunksWritten(Vec<ChunkIndices>),
GroupDeleted,
....
}
This would give us a high level language. Then we can build different types of formaters on top of it.
@dcherian I realize my proposal is very similar to work I'll have to do for transaction log support. That structured status looks a lot like a transaction log. Maybe we shouldn't try to do both things in parallel, but we can discuss more.
Ah nice, yes that makes sense.
I'm also thinking the formatting of the structure should live in Rust, and we may want to expose it through a CLI in the future?
We should probably design both status and transaction log as abstract data structures and then build ways to transform them into something for display in different contexts, e.g.
__repr__
Design Doc
Goal
Our goal is to have
.status()
surface useful information to the user. Sincestatus
is inherently backward-looking, I expect users will approach it with four questions in mind:Information to surface
For Q1, we should surface VCS history information: a. Repo bucket b. Base Snapshot ID and commit time. c. Do we need to surface a truncated commit message too? d. Current branch
For Q2, we should surface information in the current ChangeSet:
new_groups
,new_arrays
deleted_groups
,deleted_arrays
updated_arrays
-> zarr array metadataupdated_attributes
-> group/array user attributesset_chunks
-> modified chunks (create/delete/overwrite)For Q3, we punt to later :)
For Q4, we should encourage the user to make a commit by saying loud and clear that these are "uncommitted changes" and will be lost (?).
How to surface
As a tree?
One particularly neat way to surface this information would be to show the hierarchy as tree. We could construct a tree for the snapshot + changeset:
new-tree
. Then iterate through the changeset and add information to the appropriate node of the tree. During diffing, if two nodes are in the same place, we examine the changeset for metadata, chunk modifications and annotate thenew-tree
appropriately. Thediffed-tree
structure could then be rendered to text in Rust, and transferred to Python for rendering withrich
(for example)As plain text?
We could simply output formatted lists of created/updated/deleted groups/arrays.
Misc
Some things to be careful about:
status
outputs should be easy