Add GLPK solver reading

trevorb1 commented 1 year ago

Description

In this PR I add a the class ReadGlpk(...) which allows a user to process GLPK solution files. Associated unit tests have also been added. This PR addresses one of the comments in the JOSS review about being able to use only a single solver with otoole.

Issue Ticket Number

Closes #19

Documentation

A sample use case has been added to the examples page on the documentation site. Notably, the user must use the --wglp and --write flags when creating the *.lp and solution files

glpsol -m model.txt -d data.txt --wglp model.lp --write model.sol

Then when processing the solution file with otoole, the user must also pass in the new optional flag --glpk_model added to otoole.

otoole results glpk csv model.sol results config.yaml --glpk_model model.lp

As elaborated in issue #19, and the GLPK wiki, while GLPK can write out a human readable format file (using the --output flag), it is suggested to use the --wglp and --write flags for machine readable files. Both these files are needed to extract out all solution values.

Questions

Does this process of adding in the optional --glpk_model flag to otoole make sense? I couldn't really come up with a better way to do this. I don't know if this is making it too confusing though?
The new ReadGlpk() class inherits from the ReadResultsCBC() to use its functionality to convert between wide and long formatted dataframes. This means that now ReadCbc(), ReadGurobi(), and ReadGlpk() all inherit from the ReadResultsCBC() class. The name ReadResultsCBC() may be a little outdated now. Should we change it to something more general like ConvertReadResults() or something like that, since its main function is to convert wide data to long data?

trevorb1 commented 1 year ago

Also, I have no idea what is up with this typing issue! Does anyone have any thoughts, else I will look at it again next week :)

willu47 commented 1 year ago

Hi @trevorb1. The typing issue is raised by the static type checker mypy. You can use it as part of the pre-commit configuration bundled in the otoole package (perhaps you already are). The typing error raised is

src/otoole/results/results.py:453: error: Incompatible types in assignment (expression has type "float", target has type "str")
src/otoole/results/results.py:461: error: Incompatible return value type (got "Tuple[Dict[str, str], Any]", expected "Tuple[Dict[str, Union[str, float]], Any]")

Fix by adding a type hint inline to line 433 of src/otoole/results/results.py

        status: Dict[str, Union[str, float]] = {}

trevorb1 commented 1 year ago

Thanks so much for the feedback, @willu47! I guess I always thought directly reading in a text/csv file was more efficient then using pandas (since it uses built in datatypes) at the expense of being less useful to work with the data. So it is good to know read_csv() doesn't necessarily compromise performance.

I updated the ReadGlpk logic to use read_csv() as you suggested! The reading of the model file was fairly straightforward. The reading of the solution file was a little more awkward, as the number of columns would change if I just parse on whitespace characters. This would cause pandas to throw an error. There is the on_bad_lines argument to pass into read_csv(), but this would just drop the lines causing errors altogether.

I landed on reading everything in all at once, then parsing out the header information from the solution information via slicing and splitting the strings. Alternatively, we could read in the csv in parts (similar to shown below) to have read_csv() do the parsing for us. But this seemed more cumbersome as then we will have to type check for file_path if it is a string or TextIO object (which isnt shown in the code below).

def read_solution(self, file_path: Union[str, TextIO]) -> Tuple[Dict[str, Union[str, float]], pd.DataFrame]:
    # get status information
    df_status = pd.read_csv(file_path, header=None, nrows=7, sep=":", index_col=0)
    status["name"] = df_status.loc["c Problem", 1].strip()
    status["status"] = df_status.loc["c Status", 1].strip()
    status["objective"] = float(df_status.loc["c Objective", 1].split()[2])
    file_path.seek(0) # would need to type check this 

    # get solution infromation
    df = pd.read_csv(
        file_path,
        header=None,
        skiprows=[x for x in range(0, 8)],
        sep=r"\s+",
        names=["ID", "NUM", "STATUS", "PRIM", "DUAL"],
    )

    return status, df

I also slightly changed the logic of ReadGlpk to require the model file upon instantiation. It seemed pointless to read in a GLPK solution file without the model file since it doesn't have any information about what the variables are.

Please let me know if you have any other thoughts! Else I think we should be good to merge :)

willu47 commented 1 year ago

Hi @trevorb1 - given the tweak I made to the otoole results cli and the addition of the read_results function in #180, it would be worth updating this branch to reflect those changes, and merging this after that PR.

willu47 commented 1 year ago

Regarding renaming of classes, how about changing ReadResultsCBC to ReadWideResults?

trevorb1 commented 1 year ago

Hi @willu47, thanks for the pointers! All functionality to close this issue has been implemented, including the integration of reading GLPK with the Python API update. So I am going to go ahead and merge this.

Possibly a new issue will be how the glpk_model argument is passed into the getter function for the read result strategy. This is needed as the model file is required to instantiate the ReadGlpk class. However, Im not sure if this clutters the function (see below).

If you think refactoring how this works is needed, please just let me know or create a new issue and assign me to it. Thanks!

https://github.com/OSeMOSYS/otoole/blob/5a194de72a15e3defb292b4273b765a74c174b90/src/otoole/convert.py#L146-L180

OSeMOSYS / otoole