Revise Lenses into special operators (like they are in Adaptive Schemas)

UBOdin / mimir

Data-ish exploration through SQL+Uncertainty

Apache License 2.0

27 stars 13 forks source link

Lens/Model Roles

Tag data values as uncertain/problematic.
Provide a human-readable description of the reason that the value is uncertain/problematic.
Produce a "best guess" value (that is problematic)
Provide an identifier to allow uncertain values to be acknowledged (un-tagged).
Identify a procedure by which the user can address the error / repairing the value.
Optionally provide a facility for sampling alternative values.

1-3 are the critical features.
4 and 5 are useful support, although the current (heavy-weight) implementation of overriding the model value isn't the only way to implement repairs.
6 hasn't been exploited in a long time. It's also generally one of the weakest points whenever I present Mimir. However, the feature is useful for research purposes, and might provide some benefits for applying Mimir to simulation workloads (as opposed to data cleaning)

Limitations of the Existing Approach

A single opaque blob (the model) handles everything, so it needs to be serialized and passed around.

Query structure is fixed at the time of lens creation (no option-driven rewrites)

Feedback has to go through the model, when often a lighter-weight fix (e.g., manually replacing null values) exists.

Sampling is not always feasible (e.g., no well-defined distribution exists, or the domain is infinite)

Acknowledgements are handled by the big opaque blob, adding more junk to be serialized and passed around.