OSeMOSYS / otoole

OSeMOSYS Tools for Energy
https://otoole.readthedocs.io
MIT License
23 stars 17 forks source link

Missing datafile definitions not caught #160

Open trevorb1 opened 1 year ago

trevorb1 commented 1 year ago

Description

When reading in a MathProg datafile, all parameter/set definitions in the datafile also need to be defined in the config.yaml file. If a definition is present in the datafile, but not in the config.yaml file, then an AmplyError is raised. This logic does not work in reverse.

If the config.yaml file has parameter definitions not present in the datafile, then I would expect a warning/error to be raised. Instead, the parameter is added to the internal datastore with the default value defined in the config.yaml file.

Other read strategies raise a OtooleNameMismatchError in these instances.

How to replicate

Remove the parameter AccumulatedAnnualDemand from a MathProg datafile, and ensure the config.yaml file has the definition:

AccumulatedAnnualDemand:
    indices: [REGION,FUEL,YEAR]
    type: param
    dtype: float
    default: 0

Thoughts on Solution

We use the config.yaml file to first determine what parameters to search for in the datafile, then pass that into the Amply object. Therefore, we either need to reformulate this logic, or change how amply deals with missing parameters.

https://github.com/OSeMOSYS/otoole/blob/3c6f04e03b5ad77d0f938ceba546d1079b82c377/src/otoole/read_strategies.py#L297-L325

Related issues/PR

This is an edge case of issue #151, with the rest of the issue addressed in PR #157.

willu47 commented 1 year ago

One option is to use a regex to parse the datafile for parameter and set definitions and then check these against the config file prior to reading in the data with the amply parser.

Something like this script can be used to extract lists of sets, parameters and variables from a file. There are significant performance issues though - this is likely to be slow on a large datafile.

import re

def parse_gmpl_code(gmpl_code):
    # Initialize the variables to store the sets, parameters, and variables
    sets = {}
    parameters = {}
    variables = {}

    # Define regular expressions to match the different GMPL components
    set_regex = re.compile(r'set\s+(?P<set_name>[^\s;]+)\s*;')
    param_regex = re.compile(r'param\s+(?P<param_name>[A-Za-z]+)\s*(?P<symbolic>symbolic)?(?P<indices>\s*\{[^\}]*\})?\s*(?P<default>default\s+[^;]+)?\s*(?P<binary>binary)?[;:=]')

    var_regex = re.compile(r'var\s+(?P<var_name>[^\s;,]+)(?P<indices>\s*\{[^\}]*\})?\s*(?P<bounds>>=\s*[^\s;]+)?\s*;')

    # Parse the sets
    for match in set_regex.finditer(gmpl_code):
        set_name = match.group('set_name')
        sets[set_name] = []

    # Parse the parameters
    for match in param_regex.finditer(gmpl_code):
        param_name = match.group('param_name')
        indices = match.group('indices')
        default = match.group('default')

        if indices:
            # Parse indices
            indices = re.findall(r'\{([^\}]*)\}', indices)[0]
            indices = [i.strip() for i in indices.split(',')]
            parameters[param_name] = {'indices': indices}
        else:
            parameters[param_name] = {}

        if default:
            # Parse default value
            default = default.strip().split()[-1]
            parameters[param_name]['default'] = default

    # Parse the variables
    for match in var_regex.finditer(gmpl_code):
        var_name = match.group('var_name')
        indices = match.group('indices')
        bounds = match.group('bounds')

        if indices:
            # Parse indices
            indices = re.findall(r'\{([^\}]*)\}', indices)[0]
            indices = [i.strip() for i in indices.split(',')]
            variables[var_name] = {'indices': indices}
        else:
            variables[var_name] = {}

        if bounds:
            # Parse variable bounds
            # bounds = bounds.strip().split()[-1]
            variables[var_name]['bounds'] = bounds

    # Return the parsed sets, parameters, and variables
    return sets, parameters, variables

with open('OSeMOSYS.txt', 'r') as textfile:
    osemosys = textfile.readlines()

sets, params, vars = parse_gmpl_code("".join(osemosys))