Model writes different `.lp` file when re-running without any changes to source code or input data

kdheepak commented 2 months ago

Hi,

Thanks for maintaining openTEPES.

When re-running the model without any changes to the source code or the input data, I get different .lp files and hence different objective function values.

Here's a tar.gz file that contains a zip file that contains two lp files, run1.lp and run2.lp :

run1-run2-lp-files.tar.gz

I had to double compress to get around github's file size limit, the original lp files with symbolic variable names are 425 MB each:

It's not easy to compare these files directly. Most diff will fail for files this size. Also, sometimes the model writes the same equation out but the terms are in a different order (which might be related to the cause of this issue, see below).

I wrote a small python snippet to compare the .lp files:

import hashlib

def compute_file_hash(filepath, hash_type="sha256"):
    """Compute the hash of a file using the specified hash type (default: sha256)."""
    hash_function = hashlib.new(hash_type)

    # Read the file in chunks to avoid loading large files into memory
    with open(filepath, "rb") as file:
        for chunk in iter(lambda: file.read(8192), b""):
            hash_function.update(chunk)

    return hash_function.hexdigest()

def compare(filepath1, filepath2):
    # Print the hash of each file to quickly check for differences
    print(f"{filepath1=}", f"{compute_file_hash(filepath1) = }")
    print(f"{filepath2 =}", f"{compute_file_hash(filepath2) = }")

    # Load the contents of each file into memory as a list of lines
    with open(filepath1) as file_run1:
        lines_run1 = file_run1.readlines()

    with open(filepath2) as file_run2:
        lines_run2 = file_run2.readlines()

    # Ensure both files have the same number of lines
    assert len(lines_run1) == len(lines_run2), "The files have different numbers of lines."

    # Compare the files by breaking them into chunks (separated by double newlines)
    # and comparing sorted lines within those chunks
    for _, (chunk1, chunk2) in enumerate(zip("".join(lines_run1).split("\n\n"), "".join(lines_run2).split("\n\n"))):
        if sorted(chunk1.split("\n")) != sorted(chunk2.split("\n")):
            # Report the first mismatch found between the two files
            print("First mismatched section:")
            print(f"{filepath1} section:\n", chunk1)
            print(f"{filepath2} section:\n", chunk2)
            break
    else:
        print("All lines match")

With this snippet, you can check the the .lp's hash that I used is the same as the .lp's has in the compressed folder (hash is based on the content of the file, if the content changes the hash changes). I printed the hashes below for reference.

This snippet compares the files by breaking them into "paragraphs", i.e. separated by 2 new lines. This allows a per equation comparison. This snippet assumes that in each equation, every term is on a new line, so within a chunk, it sorts the chunk before comparing. If there's a mismatch, it prints the original chunk.

Here's the output of the code:

You can see that the snippet did not fail on the number of lines check. Both files have the same number of lines. I verified that pyomo thinks these have the same number of variables and constraints too.

The mismatch occurs in the constant in the equation:

I believe that is because of this term pInitialUC in this constraint:

    def eUCStrShut(OptModel,n,nr):
        if mTEPES.pMustRun[nr] == 0 and (mTEPES.pMinPowerElec[p,sc,n,nr] or mTEPES.pConstantVarCost[p,sc,n,nr]) and nr not in mTEPES.eh and (p,nr) in mTEPES.pnr:
            if n == mTEPES.n.first():
                return OptModel.vCommitment[p,sc,n,nr] - mTEPES.pInitialUC[p,sc,n,nr]()                 == OptModel.vStartUp[p,sc,n,nr] - OptModel.vShutDown[p,sc,n,nr]
            else:
                return OptModel.vCommitment[p,sc,n,nr] - OptModel.vCommitment[p,sc,mTEPES.n.prev(n),nr] == OptModel.vStartUp[p,sc,n,nr] - OptModel.vShutDown[p,sc,n,nr]
        else:
            return Constraint.Skip
    setattr(OptModel, 'eUCStrShut_'+str(p)+'_'+str(sc)+'_'+str(st), Constraint(mTEPES.n, mTEPES.nr, rule=eUCStrShut, doc='relation among commitment startup and shutdown [p.u.]'))

All comparisons shown in this issue were done on this commit:

https://github.com/IIT-EnergySystemModels/openTEPES/tree/9dd774b3bf6f3b0f9da2f3e7c7a7efe209fb5522

pInitialUC is initialized to all zeroes here:

https://github.com/IIT-EnergySystemModels/openTEPES/blob/9dd774b3bf6f3b0f9da2f3e7c7a7efe209fb5522/openTEPES/openTEPES_InputData.py#L1490-L1494

And then is updated in this block:

https://github.com/IIT-EnergySystemModels/openTEPES/blob/9dd774b3bf6f3b0f9da2f3e7c7a7efe209fb5522/openTEPES/openTEPES_InputData.py#L1836-L1850

Maybe it is possible that the code that updates it is not deterministic (i.e. with the same input data and same source code, different indices of pInitialUC are updated)?

It's hard for me to tell the exact reason for this and figured I'd post here to see if you had any thoughts.

kdheepak commented 2 months ago

It's worth nothing that the file's hash will not be identical even if the compare function thinks the lines are all the same:

This is because lines in a "chunk" or a "paragraph" may be of a different order, even if in the end the sorted chunks are equivalent. Here's an example of the bounds being written in a different order:

I don't think this ordering affects the solution of the optimization, although I haven't tested it to be certain (it takes over an hour to solve the 9n case).

This change in order is not deterministic, i.e. different .lp files are written each time. However, once the model is built, the same .lp file is always written when writing out the file multiple times.

The hashes are the same so it is the exact same file that is being written out each time.

This makes me believe there's a issue with one of the sets being initialized from a dictionary in a different order across different runs. Once it is initialized, it always gives the same results, i.e. an identical .lp file is written out to disk each time if model.write is called multiple times from with the same model. But if the model is created from scratch, a new .lp file may be created.

This may also explain why = 0 vs = 1 occurs (as shown above). With a different order, a different subset of generators would be initialized to 1.

erikfilias commented 2 months ago

Hi @kdheepak,

Thanks a lot for your detailed explanation and for providing the files and Python snippet.

We will review the code as soon as possible and look into the issue of non-deterministic behaviour when re-running the model. Your insight into the possible cause, especially with the initialisation of pInitialUC, is very helpful and will guide our review.

We'll keep you posted on our progress. Thank you again for your patience.

Best regards,

arght commented 2 months ago

Hi @kdheepak, I have run several times the same case study and effectively the lp files are not ordered in the same way in the two runs. However if I take the lp files and order in Excel and compare them in Excel row by row, they have equal content although perhaps in different order. I solve the 9n only for 168 hours for making easier the runs by deleting the duration after hour 169 and delete also the value of RESEnergy to avoid infeasibilities. The order of the lp file may affect the optimal solution. I have used the simplex method and disable the presolve to discard things. Still trying to understand the issue but working on it Regards and many thanks for discovering it Andrés

kdheepak commented 2 months ago

Can you share the log or the objective function value for solving

168 hours
with deleted RESEnergy
with simplex
disabled presolve

Also, how do you disable presolve? And are you using HiGHS? If not, which solver are you using? It'd be great to be able to converge on results using the same version of a solver, using a test case that covers as much of the codebase as possible, and add that as a test in the repo. Ideally this test would run as fast as possible.

fwiw, I tried generating 20 different lp files in a loop (with a 1/0 before problem solving and a try / except in the loop) and it ends up the same content (with different order) every time. So this problem doesn't seem to happen often. It seems to happen just once in a while.

arght commented 2 months ago

These are the log files of two runs with two solvers (gurobi and highs). The runs with Gurobi were with simplex and without presolve. The runs with Highs were with default parameters There are minor differences in the log files. Annexing also the duration and RESEnergy files for reducing the size of the 9n case.

oT_Data_Duration_9n.csv oT_Data_RESEnergy_9n.csv ing also the duration and RESEnergy files for reducing the size of the 9n case.

openTEPES_highs_9n.log openTEPES_highs_9n_h.log openTEPES_gurobi_9n.log openTEPES_gurobi9n.log

kdheepak commented 2 months ago

Thanks for sharing!

The oT_Data_Duration_9n.csv you attached as all 8736 hours. I'm assuming you are deleting all but the first 168 hours when you run it locally, right?

When I only run with HiGHS, just 7 days (i.e. just 168 hours), and remove RESEnergy to prevent infeasibilities, I get the following:

Presolve : Reductions: rows 12669(-6486); columns 16041(-10171); elements 61751(-13066)
Solving the presolved LP
Using EKK dual simplex solver - serial
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Ph1: 0(0) 0s
       8742     6.1634091957e+00 Pr: 0(0); Du: 0(3.88578e-16) 1s
       8742     6.1634091957e+00 Pr: 0(0); Du: 0(3.88578e-16) 1s
Solving the original LP from the solution after postsolve
Model   status      : Optimal
Simplex   iterations: 8742
Objective value     :  6.1634091957e+00
HiGHS run time      :          0.54
Termination condition:  optimal
# ==========================================================
# = Solver Results                                         =
# ==========================================================
# ----------------------------------------------------------
#   Problem Information
# ----------------------------------------------------------
Problem:
- Lower bound: 6.163409195706693
  Upper bound: 6.163409195706693
  Number of objectives: 1
  Number of constraints: 0
  Number of variables: 0
  Sense: minimize
# ----------------------------------------------------------
#   Solver Information
# ----------------------------------------------------------
Solver:
- Status: ok
  Termination condition: optimal
  Termination message: TerminationCondition.optimal
# ----------------------------------------------------------
#   Solution Information
# ----------------------------------------------------------
Solution:
- number of solutions: 0
  number of solutions displayed: 0
  Problem size                         ...  19155 constraints,  30330 variables
  Solution time                        ...  1 s
  Total system                 cost [MEUR]  6.163409195706693

This seems different from your objective function.

arght commented 2 months ago

I was considering only the first 24 h and emptying the remaining ones oT_Data_Duration_9n.csv

kdheepak commented 2 months ago

I made a PR adding a 9n7d case and added a test as well: https://github.com/IIT-EnergySystemModels/openTEPES/pull/65

It runs with HiGHS in a few seconds (I'm happy to change the PR to use just 24 hours if you'd like).

I noticed when running the test multiple times, I get a completely different objective function value every alternate run. Here's a screenshot of the test failing:

Here's a screenshot of the same test succeeding immediately later:

This is without changing the data on disk and no changes to the source code. I'm just re-running the test. It seems to happen fairly often. Maybe this can help identify the issue discussed here?

kdheepak commented 2 months ago

I was considering only the first 24 h and emptying the remaining ones

I see. I didn't notice that it was emptied. That makes sense.

arght commented 2 months ago

I have updated the openTEPES version fixing the computation of the initial UC (version 4.17.5) and but still remains the ordering of the lp file. I wrote an issue to pyomo forum and it is being addressed by John Siirola

kdheepak commented 2 months ago

Thanks for the reply! Would you be able to point to the commit where this change was made in openTEPES?

arght commented 2 months ago

I made a mistake and I didn't update the InputData file. Now, It is updated in GitHub

arght commented 2 months ago

I will update the Pypi version because we generated it this morning and made this mistake

kdheepak commented 2 months ago

Thanks for the update! I'll have to check in more detail later and report back.

kdheepak commented 2 months ago

I just noticed that there is ordered=False in a number of places. For example, from your most recent commit fix in https://github.com/IIT-EnergySystemModels/openTEPES/commit/036d996bb5ca15bf6cda5f8c4ac3218877485815, you changed mTEPES.g to be ordered:

But mTEPES.re a couple of lines below is ordered=False.

This causes a non-deterministic order of the sets in mTEPES.nr:

https://github.com/IIT-EnergySystemModels/openTEPES/blob/036d996bb5ca15bf6cda5f8c4ac3218877485815/openTEPES/openTEPES_InputData.py#L621-L622

I did just remove all ordered=True and all ordered=False from the code and now I'm getting deterministic results for the test in https://github.com/IIT-EnergySystemModels/openTEPES/pull/65.

Is there a reason ordered=False was used to begin with? Would you consider removing all ordered=False and all ordered=True and defaulting to pyomo's default? I'm happy to submit a PR for that. I believe that may close this issue.

kdheepak commented 2 months ago

I have updated the openTEPES version fixing the computation of the initial UC (version 4.17.5) and but still remains the ordering of the lp file. I wrote an issue to pyomo forum and it is being addressed by John Siirola

Thanks for posting to pyomo forum. For future reference, this is the issue: https://groups.google.com/g/pyomo-forum/c/Lbt2EDm5VzI

arght commented 2 months ago

OK, to change to all the sets to ordered (which is the pyomo default) in a PR

IIT-EnergySystemModels / openTEPES

Model writes different `.lp` file when re-running without any changes to source code or input data #63