equinor / sunbeam

GNU General Public License v3.0
7 stars 14 forks source link

Sunbeam API - continued #63

Closed joakim-hove closed 4 years ago

joakim-hove commented 6 years ago

When #62 - or something similar is merged Sunbeam will again be compatible with the API in opm-parser/master. Here i discuss/suggest some changes to the API:

Entry points There are currently four constructors in opm-parser which take user-input - i.e. a stream of eclipse formatted keywords, which are exposed in sunbeam to create Deck, EclipseState, Schedule and SummaryConfig[1]. These are currently exposed in sunbeam as:

import sunbeam
# Deck:
deck = sunbeam.parse_deck(filename_or_string, extra_keywords=[], error_action=[])

# EclipseState:
ecl_state = sunbeam.parse( filename_or_string, error_action=[])

# Schedule:
schedule = sunbeam.create_schedule( filename, error_action=[])

# SummaryConfig:
summary_config = sunbeam.create_smry_config(filename)

Names I would suggest that these were all renamed to descriptive nouns - i.e.

deck = sunbeam.deck( )
ecl_state = sunbeam.state( ) / sunbeam.ecl_state( )
schedule = sunbeam.schedule( )
summary_config = sunbeam.summary_config( )

The input argument: filename, string or Deck instance? The sunbeam.parse_deck( ) / sunbeam.deck( ) function must take a string or a filename (also a string ...) all the others could in principle also take a Deck instance as argument. Currently the sunbeam.parse() function has the (schematic) code:

if os.path.isfile( input_arg ):
    parse_file( input_arg )
else:
   parse_string( input_arg )

The disadvantage with this that we can not reliably raise IOError() when people pass a not-existing-file; which is quite unfortunate. For developers the ability to pass a string for parsing is very convenient, but for end-users I think it is considerably less relevant - end users will pass a filename. As I see it there are three possibilities:

  1. We can keep the multiplexing.
  2. We can have use optional arguments: def deck(filename=None, input_str=None, ...)
  3. We can have two entry points - where the default def deck(filename, ...) takes a filename, and then there is an extra entry point which take a string: def deck_from_string(input_string, ...)

I would prefer to go with alternative 3.

The C++ EclipseState, Schedule and SummaryConfig constructors take a Deckinstance as argument, so the current Python API involves creating a temporary Deckinstance and then forwarding that. Parsing the input file and creating a deck is the clearly most time consuming part of the sunbeam operation, it would therefor be beneficial if the Deckobject could be reused. Instead of multiplexing the input argument on filename / string I suggest we switch on Deck versus filename, i.e. the sunbeam.ecl_state() function could look like:

def deck(filename, ...):
    if not os.path.isfile(filename):
        raise IOError("No such file")
   return Deck(.. )

def ecl_state(deck_input, ...):
     if not isinstance(Deck, deck_input):
            deck_input = deck( deck_input, ....)

     return EclipseState( deck_input, ....)

Exposing the Parser and ParseContext The current Python API does not at all expose the Parser or ParseContext classes at all, temporary instances are created used and discarded.The use of temporary Parserand ParseContext objects works fine in the default case, but when you want to augment the parser with extra keywords (something which is quite interesting in Python!), or configure non-default error handling with the ParseContextobject, it becomes a bit weird - in particular you probably need to pass the same arguments repeatedly if you are using more than one of the entry points in your application.

My suggestion is to expose a Parser and ParseContext class in Python, and then the entry points get these as optional arguments:

def deck(filename, parser=None, context=None):
    if parser is None:
        parser = Parser( )

    if context is None:
       context = ParseContext()

   return parser.parse_file( filename, context )

[1]: The SummaryConfigobject is not very important - and could probably be skipped from the discussion.

JensGM commented 6 years ago

I feel like most of these things are OPM-Parser details leaking into sunbeam unnecessarily. For example the fact that you need to parse the same file multiple times to get the state, the schedule and the config. It makes sense to separate the deck and state as they are on two very different abstraction levels, but those three are on the same level and originating from the same file. Internally we might need to parse the file three times to cope with the parser API changes, but that does not mean a user has to see that in sunbeam.

I don't really understand the wish for the name changes either. sunbeam.state(...) would look like a constructor, but is in reality a function that parses and returns a state. Isn't it then more sensible to call it sunbeam.parse(...) or even sunbeam.state.parse(...)?

joakim-hove commented 6 years ago

[ My backtick button does not work ... :-( ]

I feel like most of these things are OPM-Parser details leaking into sunbeam unnecessarily. For example the fact that you need to parse the same file multiple times to get the state, the schedule and the config. [....]

I don't really understand comment. The PR: #62 is a minimal change to sunbeam to work the master version of opm-parser; something along those lines +/- must be done to be able to to follow the opm-parser C++ project. The api following #62 is IMHO not very nice, and here I try to suggest some changes which I think will improve it.

Entry points In opm-parser there are now four different objects/classes which can only be created by explicit construction by the user, the four classes are:

  1. Deck
  2. EclipseState
  3. Schedule
  4. SummaryConfig (the least important - with a wide margin ...)

All other instances we can get from one of these four. That implies that to be able to get hold of everything which opm-parser can generate we need sunbeam symbols (constructor or function) for these four:

In current master we have sunbeam.parse() which creates an EclipseState and sunbeam.parse_deck() which creates a Deck. With #62 we get the additional functions sunbeam.create_schedule() and sunbeam.create_smry_config(). Now all of these four functions do the following:

  1. Parse a file
  2. Return an object.

The can obviously not all be named parse; that is the reason I suggested that they are all named after the product type the return, i.e.

sunbeam.state() -> EclipseState sunbeam.schedule() -> Schedule sunbeam.deck() -> Deck sunbeam.summary_config() -> SummaryConfig.

To me that is reasonably clean and logical. I do not like the current names sunbeam.parse() sunbeam.parse_deck() - it is impossible to infer the return value (i.e. an EclipseState and a Deck) from the names.

Multiple parsings The Deck constructor obviously needs to read a file, all the other constructors will in C++ take a Deck instance and not a file/string. It makes very much sense to facilitate for Deck reusage also in sunbeam because approx 90% of the time is spent in the Deck construction phase, and at least for realistic field models this can take quite long time (in an interactive setting) i.e. on the order of 10 seconds.

markusdregi commented 4 years ago

The development of Sunbeam stopped a long time back. The development of the Python bindings are continued in opm-common and our recommendation is to use these. Closing all issues.