Open dyaffe opened 1 month ago
quick thoughts:
Bucket life-cycle policies remove data -- that means, if I create a materialization for a collection after the fact, I will never see as many documents read as have been written, and a % completion metric can never be accurate. This seems a likely potential source of confusion.
Is this really just a larger grain of time than "month"? "year"?
If we introduce a "compaction" feature for a collection, that also could reduce the number of docs / bytes I need to actually read -- though compaction can likely be framed as a truncation, which makes them the same problem.
These smell a bit like guages (rather than countesr) that are tracked and reported by tasks -- "I've captured this many docs / bytes since the binding was last truncated" or "I've read this many docs / bytes since I last saw a truncation for this binding"
Discussed options:
Goal Understand how a data flow is progressing and whether / how much a materialization is behind a capture.
Proposal