Ensemble Evaluator API Workshop

Create pseudo code/playground for how we want to use the Ensemble Evaluator API

Points:

Configure run environments on initialization
Specification of which runtime environment is part of the forward model specification

class Evaluator:
    def __init__(self, runtime_environments)

    def evaluate_ensemble(self, variables, forward_model, event_hook)

Based on some initial thoughts I've compiled the following with the time available:

I want to be able to create an evaluator like this:

evaluator = EnsembleEvaluator(environment, ensemble, event_hooks)

after which I would like to be able to do:

evaluator.evaluate(ensid=None) -> evalid

evaluator.evaluations -> [evalid]

stop(evalid) -> status_code

In addition, I would like the creation data to be something on the form of:

# environment
script_jobs:
  -
    name: sim_prep
    exec: sim_prep.py
    config:
      sim_prep_mode: 42
      active_good_stuff: True
      scalars: [1, 2, 3]
  -
    name: simulator
    exec: /path/to/my_sim.py
    config:
      fast: True
      versions:
        - 2019
        - 2056
  -
    name: sim2csv
    exec: sim2csv.py

steps:
  -
    name: simulate
    type: script
    content:
      - sim_prep --do-fancy-stuff
      - simulator stuff_to_simulate
      - sum2csv summary-data.csv

queues:
  -
    name: lsf
    scaling: infinite
  -
    name: local
    scaling: 3

resources:
  -
    name: datafile
    location: /path/to/datafile
    (opaque data, loaded from files in first iteration)

# ensemble
forward model:
  -
    step: simulate
    environment:
      queue: lsf
      container: res-komodo-stable
    resources:
      - datafile
    responses:
      - summary-data
[realization]:
  variables (N-dim matrices with indicies)

# event_hook
data_hook -> realization_response (N-dim matrices with indicies)
status_hook -> waiting, running (progress), sucess, failure

I will argue that the Ensemble Evaluator API (EEAPI) should be a mini domain-specific language (mDSL) in Python. This is a fancy way of saying that the EEAPI will consist of a nice and elegant class library as you'd expect, but also offer higher level scripting wrapper around the class library. In my examples, this mDSL is a glorified builder API, but the point is that the consumer is not consuming the class library directly, which means the underlying implementation could expand/change dramatically without impacting the mDSL too much. An additional benefit, is that we can capture intent, which may be lost with a class library, which again may lead to more sophisticated validation. More on intent later.

An example:

# Pre-made constituents of the evaluator domain

onprem_hpc_executor = create_executor()
    .prefer_hpc() # maps to a list of hpc hosts, defined by us
    .only_onprem() # filters hpc list to only onprem (no cloud for you)

local_executor = create_executor()

local_storage = create_storage()
    .connection_string("sqlite://{{project_dir}}/db.sqlite")

fmu_azure_blob_storage = create_storage()
    .base_url("azure-blob://equinor/fmu/{{project_dir}}/")
    .credentials(…)

We dogfood the mDSL and provide the user with ready-made constituents in the domain. This of course, enables the user/library author to create its own constituents. For example, I only want azure hpc:

azure_hpc_executor = create_executor()
    .hosts(["azure10.equinor.com", "azure11.equinor.com"])

These executors are basically processes that run on some machine or other. They can and must be defined on either the evaluation as a whole, or on individual forward models (FM). This is because that's how the cloud works, we should not assume anything about the underlying metal.

Without domain functionality like .prefer_hpc() and .only_onprem(), some validation may go away, but we allow the user a lot of flexibility within the domain (i.e., we can't say that azure_hpc_executor is actually HPC without .prefer_hpc()). However, we could make a guess, and then do validation. So after some time, if we find that those user-defined hosts are strictly a subset of azure/hpc, we could deduce it. In the end, the implementation is insulated, allowed to vary because the consumer isn't accessing the class library directly.

Add some more constituents:

# User then defines project specific resources, normally via config/GUI

spe1_resources = create_resource()
    .path("opm-data/spe1")
    .source(fmu_azure_blob_storage)
    .credentials(…)

eclipse_forward_model = create_forward_model()
    .is_eclipse_simulation() # -> EclipseForwardModel
    .with_eclipse_version(ECLIPSE.v2013) # -> Eclipse2013ForwardModel
    .with_resource(spe1_resources)
    .with_executor(onprem_hpc_executor)
    .with_retry_on_failure(max_retries=3)
    .on_success(lambda r: logging.INFO(r.status))
    .sink(local_storage)

npv_forward_model = create_forward_model()
    .depends(eclipse_forward_model)
    .executable("npv.py)
    .sink(fmu_azure_blob_storage)

The npv_forward_model will only run after eclipse_forward_model has run. Implicitly, a the NPV FM will run source(eclipse_forward_model) and the eclipse simulation will run sink(npv_forward_model). In other words, the eclipse data will end up both in local storage, and in azure, but in azure it's sole purpose is to provide the NPV FM with data.

evaluator = create_evaluator() # EmptyEvaluator
    .with_realizations(10)
    .set_variables(…)
    .set_parameters(…)
    .add_forward_model(npv_forward_model) # other jobs are derived from the dep tree of the npv job

evaluator.run()

The most applicable argument is for an mDSL is that there already is a domain language for the EE—built carefully by the FMU gang/ERT developers—but its constituents and functions are seldom defined explicitly in the code (e.g. the EE is itself not a class, but a set of C and Python codes), and are often ill-defined (e.g. the FORWARD_MODEL configuration keyword spans janitorial tasks as creating folders, but also three weeks of Eclipse simulations). Explicitly defining all of these constituents and functions in a class library is a good start, but, as I mentioned, intent may not be captured.

Imagine that I run $ ert run_test_experiement. My ensemble is evaluated. I would like to add a modelling step, or analysis step in my ensemble. E.g. massage and dump some data for creation of a tornado plot at the end of the evaluation.

$ ert evaluator add_forward_model tornado.yml --as-last-thing

I'd really like this to mean that only the tornado job is run if I re-run ert run_test_experiment. I'd like ERT to know what it can and cannot do, by capturing user intent. But this is based on extrapolating from my incomplete data on how users use ERT. I just know from my own experiences, that this kind of analytical/highly speculative business has to be done iteratively. Changes to a large config does not really enable that iterative style. A mDSL might cater for tooling that does.

(As you see, I've not focused on config, because to me, a config is a contract between ERT and the user, and shouldn't really impact the api design of the evaluator.)

Contrary to @jondequinor I have mainly focused on the configuration, but I agree on the point that this should not impact the API design. Anyways, I will post my idea for the configuration, so the ideas are not lost. (I have rewritten it to use the same "examples" as @markusdregi to make it easier to compare)

queue_system:
  default_driver: local
  local:
    max_submit: 50
  lsf:
    queue: mr
    max_submit: 1

jobs:
  -
    name: sim_prep
    task:
      type: executable
      executable: sim_prep.py
    settings:
      sim_prep_mode: 42
      active_good_stuff: True
      scalars: [1, 2, 3]
  -
    name: simulator
    task:
      type: executable
      executable: /path/to/my_sim.py
    settings:
      fast: True
      versions:
        - 2019
        - 2056
  -
    name: sim2csv
    task:
      type: executable
      executable: sim2csv.py

steps:
  -
    name: simulate
    type: script
    content:
      - sim_prep --do-fancy-stuff
      - simulator stuff_to_simulate
      - sum2csv summary-data.csv
    input:
      - name: data_file
        type: file/binary
      - name: magic_csv_file
        type: file/csv
      - name: poro_csv_file
        type: file/csv
    output:
      - name: summary
        type: csv
  -
    name: mock_evaluation
    type: python_function
    module: some.python.module
    function: polynomial_mock
    input: # Will be given as open streams to the function
      - name: data_file
        type: file/binary
      - name: magic_csv_file
        type: file/csv
    output: # return open stream
      - name: summary
        type: csv

resources:
  -
    name: datafile
    location: /path/to/datafile
  -
    name: my_csv_file
    location: /path/to/some_csv_file

variables:
  -
    name: param_a
    dimensions: [5, 6]
    distribution:
      type: triangular
      params:
        min: 4
        max: 10
        mode: 5

ensemble:
  realizations: 150
  forward model:
    -
      step: simulate
      environment:
        queue: lsf
        container: res-komodo-stable
      inputs: # Link expected inputs with resources/variables
        -
          input_name: data_file
          resource: datafile
        -
          input_name: magic_csv_file
          resource: my_csv_file
        -
          input_name: poro_csv_file
          variable: param_a
          type: csv
      responses: # Link expected outputs with storage
        -
          name: summary-data
          type: csv
          source: summary-data.csv

I also made an attempt at an API, but it's far from a finished solution. (And after reading @jondequinor 's proposal I'm far from convinced that this is the correct approach)

from enum import Enum

class RealizationStatus(Enum):
    UNINITIALIZED = 1
    INITIALIZED = 2
    EVALUATING = 3
    EVALUATION_FAILURE = 4
    EVALUATION_SUCCESS = 5

class EventType(Enum):
    DATA = 1
    STATUS = 2

class Step:
    # Very unfinished, struggling with how to define input/output
    def __init__(self):
        pass

    def run(self):
        pass

class Ensemble:

    def __init__(self, variables, resources, forward_model, num_realizations):
        pass

    # Point illustrated by this function is that an ensemble might
    # have some realizations which are not executed yet,
    # but will be if the ensemble is evaluated again
    def realization_status(self):
        return [] # Return list of RealizationStatus'

class Evaluator:
    def __init__(self, runtime_environments):
        pass

    def evaluate(self, ensemble, event_hook, realizations=None):
        pass

def my_hook(event):
    if event.type == EventType.DATA:
        # some data stuff
        pass
    elif event.type == EventType.STATUS:
        # some status stuff
        pass

On the topic of input/output: One thing that I struggle to address both in the configuration and the API, is how to link expected input(s) from a step (we know that an Eclipse job needs a data file) with resources/variables/output from other jobs (we want to give some_model/HELLO.DATA as input to an eclipse simulation) and how to link expected output(s) with outputs that should be used in e.g. update step. This gets especially complicated if we are running more than one instance of a job, or a job have multiple inputs of the same type (a-b is not the same as b-a, so we need to link to outputs from job to the correct input "port" of the next job). It's difficult to make these relations explicit without growing the configuration file significantly.

@jondequinor, I like flexibility and expressiveness a mDSL introduces!

Some questions:

I just know from my own experiences, that this kind of analytical/highly speculative business has to be done iteratively. Changes to a large config does not really enable that iterative style. A mDSL might cater for tooling that does.

In this scenario, are the users still expected to interact with the config file, or directly with the mDSL? I think the latter will be hard to sell to the average user, but I might be wrong.

eclipse_forward_model = create_forward_model()
    .is_eclipse_simulation() # -> EclipseForwardModel
    .with_eclipse_version(ECLIPSE.v2013) # -> Eclipse2013ForwardModel

It's not entirely clear to me how this will integrate with our current (and future) plugin system. The creation of a eclipse forward model would need to be encapsulated within the plugin, which I think (?) implicitly gives the same constraints as the Class approach. (The assumption here is that the user interacts with the configuration file and not the mDSL)

    .with_resource(spe1_resources)
    .with_executor(onprem_hpc_executor)
    .with_retry_on_failure(max_retries=3)
    .on_success(lambda r: logging.INFO(r.status))
    .sink(local_storage)

I think we need to refine how we do input/output to jobs, ref the "On the topic of input/output"-section of my suggestions. We need to be more explicit about what input we are linking to what output from the previous job, especially in the case of multiple inputs and outputs.

Imagine that I run $ ert run_test_experiement. My ensemble is evaluated. I would like to add a modelling step, or analysis step in my ensemble. E.g. massage and dump some data for creation of a tornado plot at the end of the evaluation.

$ ert evaluator add_forward_model tornado.yml --as-last-thing

I'd really like this to mean that only the tornado job is run if I re-run ert run_test_experiment. I'd like ERT to know what it can and cannot do, by capturing user intent.

I think this touches a very difficult subject: What do we allow to change before an ensemble is considered too different to another ensemble, such that it's no longer an extension/addition of the original ensemble, but instead represents an edit? I like the idea of an immutable ensemble, and I'm not sure the best way to balance this with interactive development of your ensemble.

My thought here was that for everything that is created, a handle is returned to that resource. If we want to rerun, I am thinking we need to make a new ensemble, but this can probably be hidden from a user. I am in favor of an append only solution, if that can work. So we would create a new ensable based on the failed one, and be smart about not repeating steps that had completed successfully.

def test_ensemble_api(environment, storage):

    project = environment.create_project("my_project")

    res1 = project.add_resource("file:///home/ole/eclipse/project1/file1")
    res2 = project.add_resource("https://blobby-things/project1/file2")

    job1 = project.create_job(
        name="run_eclipse",
        executable="/usr/bin/eclipse",
        provided="true",
        parameters=["param1", "param2"],
        std_out_capture="eclipse_out3")

    job2 = project.create_job(
        name="sim2csv",
        executable="sim2csv.py"
    )

   # each part of a step is defined separately
   # inputs and outputs are identified, optionally with a separate name if file-name is long
    part1 = project.create_step_part(
        name="simulate",
        cmd=["run_eclipse --in=input1 --params=param1,param2 --out eclipse/output/SOME_FILE1"],
        inputs=["input1",
                "input2",
                "param1",
                "param2"],
        outputs=[("eclipse_out1", "eclipse/output/SOME_FILE1"),
                 ("eclipse_out2", "eclipse/output/SOME_FILE2")],
    )

    part2 = project.create_step_part(
        name="sim2csv1",
        parts=["sim2csv eclipse_out1 out1"],
        inputs=["eclipse_out1"],
        outputs=["out1"]
    )

    part3 = project.create_step_part(
        name="sim2csv2",
        parts=["sim2csv eclipse_out2 out2"],
        inputs=["eclipse_out2"],
        outputs=["out2"]
    )

    # a step is made from several parts
    step = project.create_step([part1, part2, part3])

    # when making an ensemble, the resources that are the same across all realizations are given
    ensemble = project.create_ensemble(
        realizations=2,
        jobs=[job1, job2],
        steps=[step],
        common_resources=[(res1, "input1"), (res2, "input2")])

    param1_1 = storage.get_parameter_ref("myproject","param1", 0)
    param1_2 = storage.get_parameter_ref("myproject", "param2", 0)

    param2_1 = storage.get_parameter_ref("myproject","param1", 1)
    param2_2 = storage.get_parameter_ref("myproject", "param2", 1)

    # parameters are specific to each realization, so they are added separately
    ensemble.add_node_resource(realization=0, ref=param1_1, name="param1")
    ensemble.add_node_resource(realization=0, ref=param1_2, name="param2")
    ensemble.add_node_resource(realization=1, ref=param2_1, name="param1")
    ensemble.add_node_resource(realization=1, ref=param2_2, name="param2")

    count_start = 0
    count_finished = 0

    def callback(event):
        if event.job_started:
            count_start += 1
        if event.job_finished:
            count_finished += 1

    ensid = ensemble.evaluate(callback)
    ensemble.wait()

    assert count_start == 2
    assert count_finished == 2

    # getting a resource out as a handle, that can be sent to other parts of the application, like storage, assuming it can consume it
    out_ref1 = ensemble.get_node_resource_ref(realization=0, name="out1")
    out_ref1 = ensemble.get_node_resource_ref(realization=0, name="out2")

    storage.store_result("myproject", "result1", out_ref1, realiszation=0)
    storage.store_result("myproject", "result2", out_ref2, realiszation=0)

An experiment in defining a step with a special purpose dsl to make it more consise. The idea is that input and output for each part of a step declares which input it consumes, and what it produces. Input in [] and output in {}. If it is used directly in the command, it is in the position of it's use. If it is not explisit, it is declared after a %.

steps:
  -
    type: script
    defintion:
      name: simulate
      contents:
        - run_eclipse --in=[input1] --params=[param1],[param2] --out {eclipse_out1} % [input2] {eclipse_out2}
        - sim2csv [eclipse_out1] {out1:csv}
        - sim2csv [eclisee_out2] {out2:csv}

In this scenario, are the users still expected to interact with the config file, or directly with the mDSL? I think the latter will be hard to sell to the average user, but I might be wrong.

No, the users will probably not write these. However, that the business could write is one thing—read is a whole other thing. To quote Martin Fowler:

I do think that the greatest potential benefit of DSLs comes when business people participate directly in the writing of the DSL code. The sweet spot, however is in making DSLs business-readable rather than business-writeable. If business people are able to look at the DSL code and understand it, then we can build a deep and rich communication channel between software development and the underlying domain.

[code instantiating things that come from a plugin]

It's not entirely clear to me how this will integrate with our current (and future) plugin system. The creation of a eclipse forward model would need to be encapsulated within the plugin, which I think (?) implicitly gives the same constraints as the Class approach. (The assumption here is that the user interacts with the configuration file and not the mDSL)

Good point. The EclipseForwardModel would not be the basis for this part of the DSL, it would have to be more abstract.

[code that hand wavy defines a sink]

I think we need to refine how we do input/output to jobs, ref the "On the topic of input/output"-section of my suggestions. We need to be more explicit about what input we are linking to what output from the previous job, especially in the case of multiple inputs and outputs.

I totally agree. This isn't a new problem. I've done some research, and I haven't yet found good ways to organize this. The only way forward that I can think of, is to at least accept that it is truly complex and try to find a way. The current situation is, of course, intolerable as it is implicit coordination (job a produces x, job b excepts x to exist, other than that—both jobs are completely disjoint in every way).

So; yes, let's continue talking about this. There is no good solution to this problem in any of my comments thus far.

[imagine mutable, "smart" ensembles]

I think this touches a very difficult subject: What do we allow to change before an ensemble is considered too different to another ensemble, such that it's no longer an extension/addition of the original ensemble, but instead represents an edit? I like the idea of an immutable ensemble, and I'm not sure the best way to balance this with interactive development of your ensemble.

I concede that this is a bad idea.

However, I still think we should adopt the mindset that changes over time (within a domain like this) are best modelled as transactional logs.

I think this discussion is very good! I'll try to persist some reflections I've done:

Separation of construction and consumption I think it is a good idea to clearly separate the responsibility of constructing and consuming an EnsembleEvaluator. And furthermore, put the responsibility of defining the runtime environment, the forward model and the input data entirely on the one constructing it. And hence, the responsibility of the consumer is only to initiate evaluations (for parts of or the entire ensemble), monitor evaluations and respond to the status as well as the results of the evaluations. This clearly separates the responsibility of defining what is to be evaluated and running evaluations and gathering the results.

Note that the above approach has the benefit of making the consumption API the one that is widely used, while the construction DSL will be rather contained. This is quite beneficiary in my opinion as then we isolate the difficult part that will need many iterations while widely spread something that we might get mostly right from the start 🤷

Input and output I think the responsibility of defining input and output should reside on the various steps. Furthermore, output can depending on the step type either be defined within the step (by utilising the reporter) or by the configuration of the step itself (please pick up this file after I'm done). The motivation for defining input is to allow for parallel execution and up-front validation of your data pipeline. If input is not defined, it will be assumed that all output generated so far is input and hence only sequential execution is possible.

On the note of input and output I think that part of the workflow manager's responsibility is to facilitate the data flow. Hence, although a step might produce a the next step might expect that data as b. I think we need a data manipulation step type that allows us to describe thin layers of glue that makes the pipeline run. It should not be so that a result name from one step leaks into another step as input.

Workshop has ended, and the discussion is summarized here: https://github.com/equinor/ert/issues/1032

equinor / ert

Ensemble Evaluator API Workshop #1023