Closed kdheepak closed 2 months ago
It's worth nothing that the file's hash will not be identical even if the compare function thinks the lines are all the same:
This is because lines in a "chunk" or a "paragraph" may be of a different order, even if in the end the sorted chunks are equivalent. Here's an example of the bounds being written in a different order:
I don't think this ordering affects the solution of the optimization, although I haven't tested it to be certain (it takes over an hour to solve the 9n case).
This change in order is not deterministic, i.e. different .lp
files are written each time. However, once the model is built, the same .lp
file is always written when writing out the file multiple times.
The hashes are the same so it is the exact same file that is being written out each time.
This makes me believe there's a issue with one of the sets being initialized from a dictionary in a different order across different runs. Once it is initialized, it always gives the same results, i.e. an identical .lp
file is written out to disk each time if model.write
is called multiple times from with the same model
. But if the model
is created from scratch, a new .lp
file may be created.
This may also explain why = 0
vs = 1
occurs (as shown above). With a different order, a different subset of generators would be initialized to 1
.
Hi @kdheepak,
Thanks a lot for your detailed explanation and for providing the files and Python snippet.
We will review the code as soon as possible and look into the issue of non-deterministic behaviour when re-running the model. Your insight into the possible cause, especially with the initialisation of pInitialUC, is very helpful and will guide our review.
We'll keep you posted on our progress. Thank you again for your patience.
Best regards,
Hi @kdheepak, I have run several times the same case study and effectively the lp files are not ordered in the same way in the two runs. However if I take the lp files and order in Excel and compare them in Excel row by row, they have equal content although perhaps in different order. I solve the 9n only for 168 hours for making easier the runs by deleting the duration after hour 169 and delete also the value of RESEnergy to avoid infeasibilities. The order of the lp file may affect the optimal solution. I have used the simplex method and disable the presolve to discard things. Still trying to understand the issue but working on it Regards and many thanks for discovering it Andrés
Can you share the log or the objective function value for solving
Also, how do you disable presolve? And are you using HiGHS? If not, which solver are you using? It'd be great to be able to converge on results using the same version of a solver, using a test case that covers as much of the codebase as possible, and add that as a test in the repo. Ideally this test would run as fast as possible.
fwiw, I tried generating 20 different lp files in a loop (with a 1/0
before problem solving and a try / except in the loop) and it ends up the same content (with different order) every time. So this problem doesn't seem to happen often. It seems to happen just once in a while.
These are the log files of two runs with two solvers (gurobi and highs). The runs with Gurobi were with simplex and without presolve. The runs with Highs were with default parameters There are minor differences in the log files. Annexing also the duration and RESEnergy files for reducing the size of the 9n case.
oT_Data_Duration_9n.csv oT_Data_RESEnergy_9n.csv ing also the duration and RESEnergy files for reducing the size of the 9n case.
openTEPES_highs_9n.log openTEPES_highs_9n_h.log openTEPES_gurobi_9n.log openTEPES_gurobi9n.log
Thanks for sharing!
The oT_Data_Duration_9n.csv
you attached as all 8736 hours. I'm assuming you are deleting all but the first 168 hours when you run it locally, right?
When I only run with HiGHS, just 7 days (i.e. just 168 hours), and remove RESEnergy to prevent infeasibilities, I get the following:
Presolve : Reductions: rows 12669(-6486); columns 16041(-10171); elements 61751(-13066)
Solving the presolved LP
Using EKK dual simplex solver - serial
Iteration Objective Infeasibilities num(sum)
0 0.0000000000e+00 Ph1: 0(0) 0s
8742 6.1634091957e+00 Pr: 0(0); Du: 0(3.88578e-16) 1s
8742 6.1634091957e+00 Pr: 0(0); Du: 0(3.88578e-16) 1s
Solving the original LP from the solution after postsolve
Model status : Optimal
Simplex iterations: 8742
Objective value : 6.1634091957e+00
HiGHS run time : 0.54
Termination condition: optimal
# ==========================================================
# = Solver Results =
# ==========================================================
# ----------------------------------------------------------
# Problem Information
# ----------------------------------------------------------
Problem:
- Lower bound: 6.163409195706693
Upper bound: 6.163409195706693
Number of objectives: 1
Number of constraints: 0
Number of variables: 0
Sense: minimize
# ----------------------------------------------------------
# Solver Information
# ----------------------------------------------------------
Solver:
- Status: ok
Termination condition: optimal
Termination message: TerminationCondition.optimal
# ----------------------------------------------------------
# Solution Information
# ----------------------------------------------------------
Solution:
- number of solutions: 0
number of solutions displayed: 0
Problem size ... 19155 constraints, 30330 variables
Solution time ... 1 s
Total system cost [MEUR] 6.163409195706693
This seems different from your objective function.
I was considering only the first 24 h and emptying the remaining ones oT_Data_Duration_9n.csv
I made a PR adding a 9n7d case and added a test as well: https://github.com/IIT-EnergySystemModels/openTEPES/pull/65
It runs with HiGHS in a few seconds (I'm happy to change the PR to use just 24 hours if you'd like).
I noticed when running the test multiple times, I get a completely different objective function value every alternate run. Here's a screenshot of the test failing:
Here's a screenshot of the same test succeeding immediately later:
This is without changing the data on disk and no changes to the source code. I'm just re-running the test. It seems to happen fairly often. Maybe this can help identify the issue discussed here?
I was considering only the first 24 h and emptying the remaining ones
I see. I didn't notice that it was emptied. That makes sense.
I have updated the openTEPES version fixing the computation of the initial UC (version 4.17.5) and but still remains the ordering of the lp file. I wrote an issue to pyomo forum and it is being addressed by John Siirola
Thanks for the reply! Would you be able to point to the commit where this change was made in openTEPES?
I made a mistake and I didn't update the InputData file. Now, It is updated in GitHub
I will update the Pypi version because we generated it this morning and made this mistake
Thanks for the update! I'll have to check in more detail later and report back.
I just noticed that there is ordered=False
in a number of places. For example, from your most recent commit fix in https://github.com/IIT-EnergySystemModels/openTEPES/commit/036d996bb5ca15bf6cda5f8c4ac3218877485815, you changed mTEPES.g
to be ordered:
But mTEPES.re
a couple of lines below is ordered=False
.
This causes a non-deterministic order of the sets in mTEPES.nr
:
I did just remove all ordered=True
and all ordered=False
from the code and now I'm getting deterministic results for the test in https://github.com/IIT-EnergySystemModels/openTEPES/pull/65.
Is there a reason ordered=False
was used to begin with? Would you consider removing all ordered=False
and all ordered=True
and defaulting to pyomo's default? I'm happy to submit a PR for that. I believe that may close this issue.
I have updated the openTEPES version fixing the computation of the initial UC (version 4.17.5) and but still remains the ordering of the lp file. I wrote an issue to pyomo forum and it is being addressed by John Siirola
Thanks for posting to pyomo forum. For future reference, this is the issue: https://groups.google.com/g/pyomo-forum/c/Lbt2EDm5VzI
OK, to change to all the sets to ordered (which is the pyomo default) in a PR
Hi,
Thanks for maintaining openTEPES.
When re-running the model without any changes to the source code or the input data, I get different
.lp
files and hence different objective function values.Here's a
tar.gz
file that contains azip
file that contains two lp files,run1.lp
andrun2.lp
:run1-run2-lp-files.tar.gz
I had to double compress to get around github's file size limit, the original lp files with symbolic variable names are 425 MB each:
It's not easy to compare these files directly. Most
diff
will fail for files this size. Also, sometimes the model writes the same equation out but the terms are in a different order (which might be related to the cause of this issue, see below).I wrote a small python snippet to compare the
.lp
files:With this snippet, you can check the the
.lp
's hash that I used is the same as the.lp
's has in the compressed folder (hash is based on the content of the file, if the content changes the hash changes). I printed the hashes below for reference.This snippet compares the files by breaking them into "paragraphs", i.e. separated by 2 new lines. This allows a per equation comparison. This snippet assumes that in each equation, every term is on a new line, so within a chunk, it sorts the chunk before comparing. If there's a mismatch, it prints the original chunk.
Here's the output of the code:
You can see that the snippet did not fail on the number of lines check. Both files have the same number of lines. I verified that pyomo thinks these have the same number of variables and constraints too.
The mismatch occurs in the constant in the equation:
I believe that is because of this term
pInitialUC
in this constraint:All comparisons shown in this issue were done on this commit:
https://github.com/IIT-EnergySystemModels/openTEPES/tree/9dd774b3bf6f3b0f9da2f3e7c7a7efe209fb5522
pInitialUC
is initialized to all zeroes here:https://github.com/IIT-EnergySystemModels/openTEPES/blob/9dd774b3bf6f3b0f9da2f3e7c7a7efe209fb5522/openTEPES/openTEPES_InputData.py#L1490-L1494
And then is updated in this block:
https://github.com/IIT-EnergySystemModels/openTEPES/blob/9dd774b3bf6f3b0f9da2f3e7c7a7efe209fb5522/openTEPES/openTEPES_InputData.py#L1836-L1850
Maybe it is possible that the code that updates it is not deterministic (i.e. with the same input data and same source code, different indices of
pInitialUC
are updated)?It's hard for me to tell the exact reason for this and figured I'd post here to see if you had any thoughts.