equinor / semeio

Semeio is a collection of jobs and workflow jobs used in ert (https://github.com/equinor/ert).
https://github.com/equinor/semeio
GNU General Public License v3.0
10 stars 26 forks source link

Evaluate strategies for getting the scaling job to function with ES MDA #58

Closed markusdregi closed 3 years ago

markusdregi commented 4 years ago

The initially expected problem of mixing scaling factors seems to not be an issue Today both the scaling job and the ES MDA algorithm are writing to the same scaling constant in storage. In particular, the algorithm overwrites the scaling from the scaling job. This should be resolved. Investigate if a "quick fix" where we either change the operator or introduce a second scaling constant in storage is possible. Or possibly something completely different?

Run scaling job only once The ACO workflow should only be executed after the first iteration to ensure algorithmic stability. This can probably be resolved as a hack (for now) by passing the iteration number to the workflow under the hood. We need to check whether the scaling factors survive from one iteration to the next.

jondequinor commented 4 years ago

Why are they writing to the same scaling constant? The decision was made quite recently in semeio, to use the same scaling constant. What was the decision-making process like?

If I can scale pre-simulation, post-simulation, pre-update, post-update, and at arbitrary times outside the hooking system—seems to be we need to discuss if the scaling constant is to intersect or not in these cases.

markusdregi commented 4 years ago

Note that both the change of operator and the second constant would induce changes in libres. If we change operator then we need to change the operator both in the scaling job and the ES MDA algorithm. In addition we would have to make sure that for each iteration resets the scaling factor to 1.

jondequinor commented 4 years ago

A deep dive into the ES_MDA code in libres reveals that the scaling constant is seemingly only updated in a obs_data struct that is later discarded. So the scaling constant is not updated on the real observation. @markusdregi @sondreso could probably ack/nack this.

lars-petter-hauge commented 4 years ago

Why are they writing to the same scaling constant? The decision was made quite recently in semeio, to use the same scaling constant. What was the decision-making process like?

Note that there already exists a correlation scaling job in libres: enkf_main_std_scale_correlated_obs_JOB

This is the job that will be superseded with SC from semeio. So the design decision might have been influenced a bit ;)

If I understand correctly, for every iteration in one of the updates, enkf_main will copy the observations from the previous iteration. This is done in static void enkf_main_update__. The allocation of the observation in this function use the global_std_scaling from the analysis_config.

double global_std_scaling = analysis_config_get_global_std_scaling(analysis_config);
obs_data_type * obs_data = obs_data_alloc(global_std_scaling);

It is the global_std_scaling factors in the analysis_config that has been set when running es_mda from ert:

https://github.com/equinor/ert/blob/64e0f3c379e77fe1a082e8a819f88306bec6398f/ert_shared/models/multiple_data_assimilation.py#L87

lars-petter-hauge commented 4 years ago

In particular, the algorithm overwrites the scaling from the scaling job

This is true because the update section in the RunModel sets the global_std_scaling I suppose, and the workflow with the job itself is run in advance?

ert/ert_shared/models/multiple_data_assimilation.py:

EnkfSimulationRunner.runWorkflows(HookRuntime.PRE_UPDATE, ert=ERT.ert)
self.update( run_context , weights[iteration])
EnkfSimulationRunner.runWorkflows(HookRuntime.POST_UPDATE, ert=ERT.ert)
lars-petter-hauge commented 4 years ago

There is an enkf_obs_scale_correlated_std (https://github.com/equinor/libres/blob/4e90285db47057096d958decb1f76a126d6bbbfb/lib/enkf/enkf_obs.cpp#L1166)

But this is only run if the std_scale_correlated_obs in the analysis_config is True (False by default). Or when run through the workflow job enkf_main_std_scale_correlated_obs_JOB

Was wondering about enkf_obs_scale_std as well, as this calls enkf_obs_local_scale_std which seems to do some scaling. But the enkf_obs_scale_std is only called in a test ( and I searched across all relevant github repo's).

Otherwise the enkf_obs_local_scale_std is only called with the two aformentioned functions enkf_obs_scale_std and enkf_obs_scale_correlated_std. Although being exposed in the python api as localScaleStd, I can't find it being called anywhere (again, search across relevant repositories). Could be that it is called within a user-made job?

lars-petter-hauge commented 4 years ago

introduce a second scaling constant in storage is possible.

Is the std_scaling written to storage? Not even the observations hit disk if I recall correctly?

In addition we would have to make sure that for each iteration resets the scaling factor to 1.

From what I've seen so far, it doesn't. Rather the scaling is set explicitly in both code paths and never reset. But shouldn't be much problem to change it (given that none of our users expect the behaviour)

oyvindeide commented 4 years ago

Was wondering about enkf_obs_scale_std as well, as this calls enkf_obs_local_scale_std which seems to do some scaling. But the enkf_obs_scale_std is only called in a test ( and I searched across all relevant github repo's).

enkf_obs_local_scale_std is the one you want, I´m assuming enkf_obs_scale_std is the result of append only, with the intent of expanding the scaling functionality.

Otherwise the enkf_obs_local_scale_std is only called with the two aformentioned functions enkf_obs_scale_std and enkf_obs_scale_correlated_std. Although being exposed in the python api as localScaleStd, I can't find it being called anywhere (again, search across relevant repositories). Could be that it is called within a user-made job?

There is a python test calling the python API IIRC, but my opinion is that these functions should be deprecated as soon as the new jobs are past the testing phase. The way the job was called in the past is through enkf_main_std_scale_correlated_obs_JOB.

lars-petter-hauge commented 4 years ago

enkf_obs_local_scale_std is the one you want

This is used neither by the es_mda approach, nor with the CS job we have made in semeio, correct? The former sets the global_std_scaling for the observation, which is used on allocation. While the latter uses the updateStdScaling which sets the scaling factors directly on the observation node. Unless I'm missing something?

xjules commented 4 years ago

enkf_obs_local_scale_std is the one you want

This is used neither by the es_mda approach, nor with the CS job we have made in semeio, correct? The former sets the global_std_scaling for the observation, which is used on allocation. While the latter uses the updateStdScaling which sets the scaling factors directly on the observation node. Unless I'm missing something?

While studying the code I came to the same conclusion, i.e.: final_obs = global_std_scaling * (std_scaling * obs), where global_std_scaling is set in the udpate step of es_mda and std_scaling in CS job via updateStdScaling in pre-step: EnkfSimulationRunner.runWorkflows(HookRuntime.PRE_UPDATE, ert=ERT.ert) Therefore shouldn't it work as is currently internally?

oyvindeide commented 4 years ago

enkf_obs_local_scale_std is the one you want

This is used neither by the es_mda approach, nor with the CS job we have made in semeio, correct? The former sets the global_std_scaling for the observation, which is used on allocation. While the latter uses the updateStdScaling which sets the scaling factors directly on the observation node. Unless I'm missing something?

Right, a bit unclear on my part, it´s the one previously used, not in use now, no. But of the two, that´s the one that was actually used. All of that should (hopefully) be deprecated now.

lars-petter-hauge commented 4 years ago

While studying the code I came to the same conclusion, i.e.: final_obs = global_std_scaling (std_scaling obs)

Yeah, that was kind of what I got the feeling of as well. But I couldn't find the place where this is returned/calculated..? Is there a line that actually does this part?

What I did find was that the only place the global_std_scaling is used after it is applied to an observation is in obs_data.cpp:L179:

double obs_block_iget_std( const obs_block_type * obs_block , int iobs) {
  return obs_block->std[ iobs ] * obs_block->global_std_scaling;
}

Given that the obs_block_iget_std is the function that is used further down in the analysis of course..

The updateStdScaling sets the observation vector's std_scaling, and from what I can see, the std_scaling is used in each of the getter functions for summary_obs, gen_obs and block_obs

summary_obs.cpp: summary_obs_get_observations ... obs_block_iset( obs_block , 0 , summary_obs->value , summary_obs->std * summary_obs->std_scaling);

block_obs.cpp: block_obs_get_observations ... obs_block_iset(obs_block , i , point_obs->value , point_obs->std * point_obs->std_scaling );

gen_obs.cpp gen_obs_get_observations ... obs_block_iset( obs_block , iobs , gen_obs->obs_data[iobs] , IGET_SCALED_STD( gen_obs , iobs ));

where

static double IGET_SCALED_STD(const gen_obs_type * gen_obs, int index) {
  return gen_obs->obs_std[index] * gen_obs->std_scaling[index];
}

So it kind of looks like depending on which getter function is used, either the std_scaling or global_std_scaling is applied.. But I do have a feeling that this is not completely true..

lars-petter-hauge commented 4 years ago

..This is used neither by the es_mda approach, nor with the CS job we have made in semeio, correct

Right, a bit unclear on my part, it´s the one previously used, not in use now, no. But of the two, that´s the one that was actually used. All of that should (hopefully) be deprecated now.

We agree, the old job uses it:

enkf_main_std_scale_correlated_obs_JOB calls enkf_obs_scale_correlated_std which then uses enkf_obs_local_scale_std

All of that should (hopefully) be deprecated now.

As in deprecation announcements/warnings have been made?

The old job has not been removed though..

Either way, the std_scaling vs global_std_scaling still confuses me a bit

oyvindeide commented 4 years ago

As in deprecation announcements/warnings have been made?

The old job has not been removed though..

No, no deprecation warnings have been made. While the new job is in stable, I think it´s still considered under testing, so when that is done, the old should be deprecated.

xjules commented 4 years ago

Yeah, that was kind of what I got the feeling of as well. But I couldn't find the place where this is returned/calculated..? Is there a line that actually does this part?

What I did find was that the only place the global_std_scaling is used after it is applied to an observation is in obs_data.cpp:L179:

double obs_block_iget_std( const obs_block_type * obs_block , int iobs) {
  return obs_block->std[ iobs ] * obs_block->global_std_scaling;
}

After some digging starting from enk_main.update the global_std_scaling is applied as you said via obs_block_iget_std .
That is used in enkf_analysis_deactivate_outliers (https://github.com/equinor/libres/blob/b80fac1230db7bdb64511f4d5af364135929fa06/lib/enkf/enkf_main.cpp#L995)

When it comes to obs_node->std_scaling this one is actually applied in enkf_obs_get_obs_and_measure_data when loading the obs_data vector. Such scaled obs_std data is further sent into the aforementioned function enkf_analysis_deactivate_outliers: https://github.com/equinor/libres/blob/4e90285db47057096d958decb1f76a126d6bbbfb/lib/enkf/enkf_obs.cpp#L467

So it kind of looks like depending on which getter function is used, either the std_scaling or global_std_scaling is applied.. But I do have a feeling that this is not completely true..

The getter function, used also in the enkf_obs_get_obs_and_measure_data, is then determined here: https://github.com/equinor/libres/blob/4e90285db47057096d958decb1f76a126d6bbbfb/lib/enkf/obs_vector.cpp#L151

where vector->get_obs = either summary_obs_get_observations, block_obs_get_observations or gen_obs_get_observations.

To summarize:

  1. std_scaling (set by scaling job) is applied by the following chain: enkf_obs_get_obs_and_measure_data -> enkf_obs_get_obs_and_measure_node -> obs_vector_iget_observations -> obs_vector->get_obs
  2. global_std_scaling (modified by ES MDA) is applied by: enkf_analysis_deactivate_outliers -> obs_block_iget_std

Meaning this should work as is even for ES MDA, but I might got it wrong too of-course.

lars-petter-hauge commented 4 years ago

In the enkf_analysis_deactivate_outliers, the std is only used to calculate whether or not an observation is too far away from the prediction (and then deactivate if so). It isn't used any further. Do you agree?

The obs_block_iget_std is used when assembling the R (obs_block_initR) and E (obs_block_initE) matrices, I think this is where it comes in use.

I agree with you on how the std_scaling is applied.

But yes, I agree that they don't seem to overwrite directly. But this could be done implicitly somehow?

xjules commented 4 years ago

In the enkf_analysis_deactivate_outliers, the std is only used to calculate whether or not an observation is too far away from the prediction (and then deactivate if so). It isn't used any further. Do you agree?

The obs_block_iget_std is used when assembling the R (obs_block_initR) and E (obs_block_initE) matrices, I think this is where it comes in use.

I agree with you on how the std_scaling is applied.

But yes, I agree that they don't seem to overwrite directly. But this could be done implicitly somehow?

Right, I thought enkf_analysis_deactivate_outliers is something principal in the process, but it only deactivates outliers. Anyway when assembling R and E matrices by using obs_block_iget_std with obs_data that had been scaled previously by std_scaling as described before, I don't see an issue here. As you said this can happen if they really get overwritten by one another - this needs to be tested first.

lars-petter-hauge commented 4 years ago

@dotfloat @xjules and I have continued looking at this. We agree that the two scaling values are applied at different sections of the code:

So we set up the poly case now. We split the observations in order for the COS to run, and we set the std to 1,2,3,4,5.

Observations: POLY_OBS:

2.8532509308 1
7.20311703432 2
21.3864899107 3

POLY_OBS_TWO:

31.5145559347 4
53.5676660405 5

We also hard coded the scaling from the COS to be 50 for all.

And then we run the MDA ES:

Running MDA ES on (weights normalized)  4.583, 2.291, 1.146
All 6 active jobs complete and data loaded.
Scaling factor calculated from (named_dict(index=None, key='POLY_OBS'), named_dict(index=None, key='POLY_OBS_TWO'))
Calculation scaling factor, nr of primary components: 1, number of observations: 5
Keys: ['POLY_OBS', 'POLY_OBS_TWO'] scaled with scaling factor: 50
===============================================================================================================================
Report step...: 0000
Ministep......: ALL_ACTIVE   
-------------------------------------------------------------------------------------------------------------------------------
                                                         Observed history               |             Simulated data        
-------------------------------------------------------------------------------------------------------------------------------
  1   : POLY_OBS                                   2.853 +/-          229.129  Active   |             2.880 +/-           0.830  
  2   :   ...                                      7.203 +/-          458.258  Active   |             5.751 +/-           1.497  
  3   :   ...                                     21.386 +/-          687.386  Active   |            10.808 +/-           3.563  
  4   : POLY_OBS_TWO                              31.515 +/-          916.515  Active   |            18.050 +/-           7.112  
  5   :   ...                                     53.568 +/-         1145.644  Active   |            27.479 +/-          12.092  
===============================================================================================================================

Note that the uncertainty printed is 229.129 (1 * 4.583 * 50) The printing is performed in the enkf_analysis_fprintf_obs_summary (enkf_analysis.cpp)

So it looks like both scaling in fact have been applied. And we can't find anywhere where one is overwritten by the other

lars-petter-hauge commented 4 years ago

Do we have a user that has a setup they believe does not include both scalings?

We are going to create a test in libres that demonstrates the different outputs given that each of the scalings are applied.

markusdregi commented 4 years ago

Perfect! I've never investigated this, I've only been told numerous times that the existing scaling job (the one implemented in libres) and ES MDA did not mix at all and did overwrite each others scaling factors. If this does not seem to be a problem anymore, then all is good 🚀

We should of course investigate this further to be sure, but it could be that this is a non-issue.

lars-petter-hauge commented 4 years ago

that the existing scaling job (the one implemented in libres) and ES MDA did not mix at all and did overwrite each others scaling factors.

I can't see that the way it is implemented in libres is any different when it comes to the scaling variables used and how they were applied. Could be other differences though in the past, who knows. But yes, it would be good if this is a non-issue! And I also think it makes sense as is, with a "global std scaling" set from the es_mda (as that really should be global), and a per-obs scaling used by the COS job.

What might not make as much sense is how difficult it was to see when they were applied :yum:

never the less, I agree this should be tested as well.

xjules commented 4 years ago

We are going to create a test in libres that demonstrates the different outputs given that each of the scalings are applied.

Shouldn't this test be part of semeio.cos_job representing an integration test that exploits both ES MDA and COS?

oyvindeide commented 3 years ago

I am a bit stuck on this issue because as far as I can tell, there is no easy way for the workflow to know it is running an iterative algorithm.

First, the scaling factor is preserved between iterations, so if it is possible for the workflow to identify that it is running an iterative algorithm, the problem is solved.

However, that is the current problem. 1) The run context resides in ert and is not available to the workflow. 2) It is not possible to pass <ITER> to the config as that only works for forward models (and also, ARGLIST is not recognized by internal job config. 3) Libres does not know if it is running iteratively, and that is the information the job has access to. 4) Can not do caching as we cant differentiate between running multiple times from different types of experiment. 5) Parsing RUNPATH seems like a really error prone approach.

oyvindeide commented 3 years ago

Already discussed this with @lars-petter-hauge, @markusdregi, any thoughts?

markusdregi commented 3 years ago

@oyvindeide Did you come up with a brilliant conclusion when discussing? :) I'm a bit surprised that it is difficult to get the iteration number into a workflow...

oyvindeide commented 3 years ago

No, I laid out the results of my experimentation, and we tried to brainstorm other ideas, I have summed up all the things we discussed above. According to both my experimentation and the docs (https://github.com/equinor/ert/blob/b380ec5cca37918320405a93781e483868fb172f/docs/rst/manual/reference/configuration/magic_strings.rst) the magic strings will not work with workflows. We can of course change this on the ert side of things, which is the only solution I can think of at the moment.

oyvindeide commented 3 years ago

Another option is to add additional hooks to ert, I know that has been talked about in the past? Seems like that is a natural way to solve this problem.

lars-petter-hauge commented 3 years ago

I'm a bit surprised that it is difficult to get the iteration number into a workflow...

So were we..