OSeMOSYS / otoole

OSeMOSYS Tools for Energy
https://otoole.readthedocs.io
MIT License
23 stars 17 forks source link

Read in CPLEX Results #190

Closed trevorb1 closed 1 year ago

trevorb1 commented 1 year ago

Description

In this PR I implement logic to read in raw CPLEX solution files. The ReadCplex class has been updated to directly read in the solution file via the pandas method read_xml() (as CPLEX solution files are written out in XML format). Tests have not been added yet as I am unsure if this is the logic we want. If we do decide to implement this logic, I will first need to add tests.

Issue Ticket Number

Closes #2 and Closes #20 and Closes #29

Questions

The logic has been changed so the transformation and sorting of the CPLEX solution files are no longer needed. Instead, the pandas.read_xml() method is used to directly read in only the variables from the solution file (via the xpath argument).

My question is, is there a reason we are not currently using the read_xml() method to read in CPLEX solution files? I see that read_xml() was released in pandas 1.3.0 (July 2021), after these issues were created. So I am not sure if it's as simple as read_xml() not being available at the time? Or if there are performance issues associated with read_xml() I am not aware of? Or of some other reason?

A notable advantage of using read_xml() is that the ReadCplex class can be significantly simplified to be very similar to the ReadGurobi class (as implemented in this PR). Moreover, this solution reuses the logic in the ReadWideResults class to convert data. Finally, as this solution works with the parser etree, we do not need the optional dependency of lxml.

Documentation

To be done if this suggestion in implemented. But using the following commands on the Simplicity repository worked fine with this change

glpsol -m osemosys_fast.txt -d data.txt --wlp model.lp --check
cplex -c "read model.lp" "optimize" "write cplex.sol"
otoole results cplex csv cplex.sol results csv data config.yaml

Example

For clarity, below is the data structure returned by the ReadCplex._convert_to_dataframe("cplex.sol") method based on the simplicity example

image

willu47 commented 1 year ago

This seems like a much simpler solution to parsing the CPLEX solution file. I suspect that we haven't used read_xml because i) we didn't think to look for it, ii) the existing CPLEX script worked well enough.

So, I'm fully supportive of this approach, the code is considerably simpler, and I am sure that the Pandas implementation will be faster and more performant than what we had before anyway.

My only remaining question is how easily can we extract dual values from the CPLEX solution file using this approach?

trevorb1 commented 1 year ago

Great, thanks @willu47!

To summarize the changes in this PR:

If you have any remaining comments on this implementation, please just let me know. Else, I will go ahead and merge this branch tomorrow. Thanks!!

trevorb1 commented 1 year ago

Hi @willu47! Regarding extracting dual values from CPLEX solution files; I'm not sure what process people currently use to extract dual values, so I am not sure if this process will be any better. But if we wanted to use similar logic to what is implemented in this PR, I think we could use what is shown below! (Note, this snippet removes all dual values that are zero)

Raw CPLEX Solution image

Read in Dual Values image

A notable advantage of this approach is that the data structure returned here is very similar to what is returned from the ReadWideResults._convert_to_dataframe( .. ) method. Therefore, otoole has logic to process dataframes similar to this already :) A notable disadvantage of this approach is that we are reading in the solution file again, separate from reading in of the variables. But given we are filtering for only the constraints when reading, this may not be a huge performance hit?

trevorb1 commented 1 year ago
df = pd.read_xml("sol.sol", xpath=".//constraint", parser="etree")
df[["Constraint", "Index"]] = df["name"].str.split("(", expand=True)
df["Index"] = df["Index"].str.replace(")", "", regex=False)
df = df[(df["dual"] != 0)].reset_index().rename(columns={"value":"Value", "dual":"Dual"})
df = df[["Constraint", "Index", "Dual"]]
df