A good Python API - Githubissues

joakim-hove commented 7 years ago

While discussing the sunbeam project I have several times been challenged:

Make sure it becomes a *good Python API* - just wrapping all 
the C++ classes will not give a good Python API.

Now - I am the first to agree to I want Sunbeam to become a good Python API, but exactly what constitutes a good API is less than clear to me, and probably we do not agree fully either. But anyway - I am creating this issue here to serve as a focal point for discussion. Questions I have thought of include:

What should the package/module/class hierarchy look like in Python

Here comes the start of a suggestion - please fight:

sunbeam
|-- parser
|   |-- parser.py
|   |-- parser_keyword.py
|
|-- deck
|   |-- deck.py
|   |-- dekc_keyword.py
|
|-- state
    |--PVT
       |--tables
          |-- tablemanager.py
          |-- pvxx.py
    |--grid
       |--properties
          |-- int_properties
          |-- float_properties
       |--eclipse_grid
       |--faults
   |--schedule
      |-- wells
      |-- groups
      |-- ??'
   | config
      |-- simulation
      |-- init

How tight coupling Python <-> C++

Case in point: In the current sunbeam code the Parser class is not exposed at all, instead EclipseState/Deck is created by a C++ library function which instantiates a Parser (and also a ParseContext) internally - performs the parsing and returns the result. Alternatively the Parser object can be exposed all the way to Python and used something like this:

from sunbeam import Parser

p = Parser( )
ecl_state = p.parse_file("TEST_DATA")

Personally I like this better, but maybe I am too bound to the C++ classes? In the now removed ctypes based approach there was a Parser class and some decorator magic which facilitated calling the parse_file( ) method both as a bound instance method and as a class method - which would create a temporary parser: https://github.com/OPM/opm-parser/blob/9d3d43b8982f5fff0ab74f5153f67802255ec35b/lib/eclipse/python/python/opm/parser/parser.py#L49 - I kind of liked that.

Please contribute with thought and ideas.

JensGM commented 7 years ago

I don't agree on the point of exposing the Parser class rather than the function which is used now. The function, as is, already enables parsing to EclipseState objects with error handling, and to Deck objects with dynamic parser extensions without the need of constructing a Parser object. See this test.

In the package/module/class hierarchy above you've added a parser_keyword.py, what would the role of this object be? If it represents a keyword extension for the parser, I think we are better off using dictionaries (as is the case now, see the test linked above). On another note the syntax for defining dictionaries (or lists of dictionaries) is identical to JSON. This, I think, would be helpful for sunbeam users who are familiar with opm-parser, as they could look at the existing keywords for examples, or even the JSON keyword specification documentation.

joakim-hove commented 7 years ago

The function, as is, already enables parsing to EclipseState objects with error handling, and to Deck objects with dynamic parser extensions without the need of constructing a Parser object.

Yes - I know it is possible; and actually works right now. I just think it is quite opaque and convoluted; i.e. I feel it is difficult to maintain in the current form.

In the package/module/class hierarchy above you've added a parser_keyword.py, what would the role of this object be?

The hierarchy was very much a suggestion/idea - adding parser_keyword.py was very much a loose idea. The reason I added it was that by exposing a ParserKeyword that could be a source of introspection; i.e. you could ask a keyword what items it needs, what are their defaults and so on.

When it comes to adding new keywords dynamically I fully agree that this should be done by using a plain Python dictionary.

pgdr commented 7 years ago

I completely agree with Jens; it seems strange to expose a class for Parser.

Just look at how e.g. pandas would do it: pandas.read_csv('myfile.csv') will return a dataframe.

pgdr commented 7 years ago

The distinction between int_properties and float_properties should be irrelevant in Python, don't you think?

pgdr commented 7 years ago

I don't think I understand the PVT→tables→tablemanager hierarchy. Shouldn't we just have state.table['PVT'], state.table[SWOF], both of which output functions? We could even monkeypatch table to support state.table.pvt(x) being a function.

pgdr commented 7 years ago

I mean, how would we use sunbeam?

import sunbeam
es = sunbeam.parse('mydeck')
plot(es.tables.swof)
# modify deck
es.to_deck('mynewdeck')

joakim-hove commented 7 years ago

I completely agree with Jens; it seems strange to expose a class for Parser.

Well - we just disagree here; but I guess I will have to give in.

Just look at how e.g. pandas would do it: pandas.read_csv('myfile.csv') will return a dataframe.

Well that is in my opinion a good example, which illustrates why I disagree:

The csv parser in Pandas has no internal structure which is interesting to the end user, there is really only that many ways to parse a csv file. In the case of opm-parser the parser object has an internal structure which is interesting to the user: a) Does it support keyword 'KEYWORD' - and what is the structure of keyword 'XXX' b) I want to add KEYWORDY to the parser. Nevertheless it can be noted that the csv parser in the standard library can actually be returned as an object instance.
I have never really liked the numpy / matplotlib / pandas api's I think too much "maigic" is happening implicitly, and for anything but the dang trivial cases I find them quite non-intuitive. In my opinion exposing the parser would be more explicit - explicit is also supposed to be a Python virtue.

But I realize I might be seriously in minority here.

joakim-hove commented 7 years ago

The distinction between int_properties and float_properties should be irrelevant in Python, don't you think?

yes - I agree.

joakim-hove commented 7 years ago

I don't think I understand the PVT→tables→tablemanager hierarchy. Shouldn't we just have state.table['PVT'], state.table[SWOF], both of which output functions?

Well - the C++ code has two extra layers, which in my opinion need to be supported in some way.

You can have many different instance of each table, which apply to different regions of the reservoir - that is handled by the table manager.
The tables typically contain one argument column and several dependent columns - so each table could be said to be a function collection I guess.

pgdr commented 7 years ago

Joakim, please, if you have time, write out a couple of "scripts" that "use" sunbeam as I attempted above.

Don't focus on what things are, but how you want to use them.

joakim-hove commented 7 years ago

Joakim, please, if you have time, write out a couple of "scripts" that "use" sunbeam as I attempted above.

OK; I agree:

This is a constructive approach.
I have problems pulling my head out of the underlying classes and functionality.

but how you want to use them.

Of course the really honest answer to that is that I do not want to use them - I mainly enjoy tinkering with the code, and unfortunately I do know enough details of how real users would like to use sunbeam apart from my general feeling that it will be a valuable tool.

The parts of sunbeam which I think will be most interesting to users is everything PVT related and the Schedule section, that is because these two sections are not well represented (or at all) in the binary output files.

When it comes to PVT I don't know enough about the domain to come up with interesting examples, but when it comes to the Schedule section I have the following - inspired by everest:

# Create an updated Schedule section
import sunbeam

shcedule = sunbeam.load_schedule("TEST.DATA")
schedule.open_well( "WellA" , datetime(2018,1,1))
schedule.inject("WellB", datetime(2018.6,1), Water, 100000)
schedule.close_well("WellC", datatime(2020,1,1)
schedule.export( "update_schedule_file")

I have envisioned this done by iterating over Deck - but of course doing it on the Schedule object is more elegant.

# Do initial water modelling - this is currently done in RMS; 
# this an important task performed in all FMU jobs.
import sunbeam

state = sunbeam.create_state("TEST.DATA")
poro = state["PORO"]
permz = state["PERMZ"]
region = state["FIPNUM"]

# Where mathematical formula is formula involving poro and permz
# which is looped over all cells. Depending on the region value different
# formulas are chosen,
swat = mathemtical_formula( poro, permz,region)

swat.export_kewyord("inital_swat.grdecl")

pgdr commented 7 years ago

I think that was a very good idea! Thanks, Joakim!

What about reordering and making well come before open, inject, and close. So that schedule.well[·] gives a well object that has the functionality, instead of schedule having the functionality?

import sunbeam

schedule = sunbeam.load_schedule("TEST.DATA")
schedule.well['WellA'].open(datetime(2018, 1, 1))
schedule.well['WellB'].inject(datetime(2018, 6, 1), 'water', 10e6)
schedule.well['WellC'].close(datatime(2020, 1, 1))
schedule.export("UPDATED.DATA")

pgdr commented 7 years ago

As I see in your top comment, you also suggest schedule.wells, so perhaps schedule.wells['Well A'].open(·) and schedule.groups['G1'].children could be two of the top level schedule functions?

pgdr commented 7 years ago

What about tables? I also imagine we could write schedule.table['PVT'].evaluate(x) for x a real number?

numpy.some_interpolation(schedule.table['PVT'].evaluate, schedule.table['RELPERM'].evaluate)

This is a copy of Joakims hierarchy, but added close, inject, open.

sunbeam
├── deck
│   ├── deck
│   └── deck_keyword
├── parser
│   ├── parse
│   └── parse_keyword
└── state
    ├── config
    │   ├── init
    │   ├── io
    │   └── simulation
    ├── grid
    │   ├── eclipse_grid
    │   ├── faults
    │   └── properties
    ├── schedule
    │   ├── groups
    │   └── wells
    │       ├── close
    │       ├── inject
    │       └── open
    └── table

joakim-hove commented 7 years ago

What about reordering and making well come before open, inject, and close. So that schedule.well[·] gives a well object that has the functionality, instead of schedule having the functionality?

First of all: This was a loos rapid-fire suggestion, not very well thought out. My initial reaction to your suggestion was: "Yes - we should certainly go through a well object in the way you suggest". However - if the ability to export an updated Schedule section is an important goal - there is some copy/reference semantics we should at least consider. When calling the final schedule.export( ) you would expect the updates in well operating conditions to be reflected in the resulting output file, which implies that the well object must be a pointer to the an instance owned by the Schedule object, that might be clearer if it is a method on the Schedule object. But sure - I also think your suggestion is more elegant.

pgdr commented 7 years ago

I agree that if the user manages to make a change to the deck/schedule then that change should be exported.

If we stick to that philosophy ruthlessly, then I think that the users will understand, regardless of how the API looks.

markusdregi commented 4 years ago

The development of Sunbeam stopped a long time back. The development of the Python bindings are continued in opm-common and our recommendation is to use these. Closing all issues.

equinor / sunbeam

A good Python API #40