Read in CPLEX Results - Githubissues

trevorb1 commented 1 year ago

Description

In this PR I implement logic to read in raw CPLEX solution files. The ReadCplex class has been updated to directly read in the solution file via the pandas method read_xml() (as CPLEX solution files are written out in XML format). Tests have not been added yet as I am unsure if this is the logic we want. If we do decide to implement this logic, I will first need to add tests.

Issue Ticket Number

Closes #2 and Closes #20 and Closes #29

Questions

The logic has been changed so the transformation and sorting of the CPLEX solution files are no longer needed. Instead, the pandas.read_xml() method is used to directly read in only the variables from the solution file (via the xpath argument).

My question is, is there a reason we are not currently using the read_xml() method to read in CPLEX solution files? I see that read_xml() was released in pandas 1.3.0 (July 2021), after these issues were created. So I am not sure if it's as simple as read_xml() not being available at the time? Or if there are performance issues associated with read_xml() I am not aware of? Or of some other reason?

A notable advantage of using read_xml() is that the ReadCplex class can be significantly simplified to be very similar to the ReadGurobi class (as implemented in this PR). Moreover, this solution reuses the logic in the ReadWideResults class to convert data. Finally, as this solution works with the parser etree, we do not need the optional dependency of lxml.

Documentation

To be done if this suggestion in implemented. But using the following commands on the Simplicity repository worked fine with this change

glpsol -m osemosys_fast.txt -d data.txt --wlp model.lp --check
cplex -c "read model.lp" "optimize" "write cplex.sol"
otoole results cplex csv cplex.sol results csv data config.yaml

Example

For clarity, below is the data structure returned by the ReadCplex._convert_to_dataframe("cplex.sol") method based on the simplicity example

willu47 commented 1 year ago

This seems like a much simpler solution to parsing the CPLEX solution file. I suspect that we haven't used read_xml because i) we didn't think to look for it, ii) the existing CPLEX script worked well enough.

So, I'm fully supportive of this approach, the code is considerably simpler, and I am sure that the Pandas implementation will be faster and more performant than what we had before anyway.

My only remaining question is how easily can we extract dual values from the CPLEX solution file using this approach?

trevorb1 commented 1 year ago

Great, thanks @willu47!

To summarize the changes in this PR:

ReadCplex now inherits from the ReadWideResults class and implements the pandas.read_xml(...) method in ReadCplex._convert_to_dataframe( .. )
TestReadCplex has been updated to reflect reading in raw CPLEX solution files
The CLI Examples documentation page has been updated to show how to process CPLEX results
The CLI Examples documentation page has been restructured so convert and results examples are now seperate
Tests have been added for the preprocess.longify_data.check_datatypes() function. I didn't change anything in this function, just noticed that it didn't have tests.

If you have any remaining comments on this implementation, please just let me know. Else, I will go ahead and merge this branch tomorrow. Thanks!!

trevorb1 commented 1 year ago

Hi @willu47! Regarding extracting dual values from CPLEX solution files; I'm not sure what process people currently use to extract dual values, so I am not sure if this process will be any better. But if we wanted to use similar logic to what is implemented in this PR, I think we could use what is shown below! (Note, this snippet removes all dual values that are zero)

Raw CPLEX Solution

Read in Dual Values

A notable advantage of this approach is that the data structure returned here is very similar to what is returned from the ReadWideResults._convert_to_dataframe( .. ) method. Therefore, otoole has logic to process dataframes similar to this already :) A notable disadvantage of this approach is that we are reading in the solution file again, separate from reading in of the variables. But given we are filtering for only the constraints when reading, this may not be a huge performance hit?

trevorb1 commented 1 year ago

df = pd.read_xml("sol.sol", xpath=".//constraint", parser="etree")
df[["Constraint", "Index"]] = df["name"].str.split("(", expand=True)
df["Index"] = df["Index"].str.replace(")", "", regex=False)
df = df[(df["dual"] != 0)].reset_index().rename(columns={"value":"Value", "dual":"Dual"})
df = df[["Constraint", "Index", "Dual"]]
df

OSeMOSYS / otoole

Read in CPLEX Results #190

Description

Issue Ticket Number

Questions

Documentation

Example