Introduce groupings / labels into observations and responses

xjules commented 3 years ago

While working on misifts visualization an issue came up relating to how to visualize univariate misfits for a large number of observations. Currently each observation provides misfits stats, which in case of tens of thousands would be unreadable. One potential solution is to group observations into clusters (groups) and visualize misfits statistics of such a group instead (by aggregating misfits over the group).

Here, we can employ clustering results from misfits_preprocessor job for instance. The labels are not persisted yet though.

Another discussion relates to grouping of responses, which would provide additional hierarchical aspect, eg. show cumulative misfits statistics from a given response group. This reflects also the upcoming MultiResponseView plugin, which would automatically unroll all response plots from a given response group.

oysteoh commented 3 years ago

@markusdregi do you have any thoughts on this - would be nice to get the ball rolling on this one and come up with some kind of solution 👍

markusdregi commented 3 years ago

Sorry for being late to the discussion here!

First, I think grouping of observations make sense. Notice that we already have a grouping of the observations when loaded from the configuration. I think this grouping and others should be represented in the same way such that under the hood there is a single way to represent a collection of observations. To keep the code complexity down... Also, I foresee this as something one would like to play around with and because of the potential amounts of data I would think that it makes sense to separate the raw observation and a group/filter/perspective/what-ever-we-want-to-call-it. Also, one probably quickly would like to attach operations to these "groups" and stack them on top of each other. So you should design for that...

On responses I'm not really sure who should produce groupings beyond what is already in the data format in the configuration? But, besides that my comments above applies here as well.

I think the combination of the misfit_preprocessor, the groupings from the configuration, the actions of the history matching algorithms (disable, scale) carries out on the observations and the user request to toggle and scale observations interactively from the GUI should give a solid set of different use cases to design for ;)

But, I think this requires some concepts that must be thought through, written down and then iterated in code. So I would urge you to write an introduction to the first intended approach here, discuss it and only then start implementation 👍

xjules commented 3 years ago

Minutes from the groupings meeting:

There are already use-cases to utilize observation groupings besides the visualization (eg. ESMDA scales the obs uncertainty with each iteration)
Groupings might come from different sources, ie. user-interaction, config files, workflow jobs, etc. therefore we need to introduce:
- meaningful semantics for introducing them in ERT
- how to apply or merge several groupings schemes into one (maybe given implicitly by the order of workflow jobs?)
- a way to internalize them (if at all) and expose them via the API
A typical use-case might be to disable / enable observations in a given group
Grouping the responses links to observation labels only at the moment as we don't have any real utilization of such response clusters yet (besides visualization that is)

oysteoh commented 3 years ago

Initially we try to solve it using metadata column. Then revise if performance is bad / other implementation is needed.

xjules commented 3 years ago

Before we can proceed with this issue:

merge metadata PR https://github.com/equinor/ert-storage/pull/57
create an artificial dataset that can be used easily configured (yml?) and used for testing
make migration of webviz-ert to use the newnew storage

mortalisk commented 2 years ago

Is everything ready to proceed with this issue?

markusdregi commented 2 years ago

I think what this issue needs the most is a clear definition of done based on an actual user story... Until that I would be reluctant to get started on it.

equinor / webviz-ert

Introduce groupings / labels into observations and responses #60