PyPSA / linopy

Linear optimization with N-D labeled arrays in Python
https://linopy.readthedocs.io
MIT License
160 stars 45 forks source link

Serializing and deserializing linopy.Model #340

Open tburandt opened 3 weeks ago

tburandt commented 3 weeks ago

Hi,

I am currently exploring/trying to setup larger models in parallel (in individual processes) and pass them back to the main process. Because the individual models are fairly large but can be prepared individually and largely independend from each other. Later on linked specific instances are linked through a few additional constraints.

However, although serializing into a pickle or dill works fine, when trying to serialize the pickle again, a recursion error is thrown and therefore, ProcessPoolExecutor cannot be used to prepare models in parallel. (I.e., ProcessPoolExecutor uses serialization to hand over data from one process to another) This can be easily checked with this example:

import dill
import pandas as pd

import linopy
import pickle

m = linopy.Model()
time = pd.Index(range(10), name="time")

x = m.add_variables(
    lower=0,
    coords=[time],
    name="x",
) # to be done in parallel process
y = m.add_variables(lower=0, coords=[time], name="y") # to be done in parallel process

factor = pd.Series(time, index=time) # to be done in parallel process

con1 = m.add_constraints(3 * x + 7 * y >= 10 * factor, name="con1") # to be done in parallel process
con2 = m.add_constraints(5 * x + 2 * y >= 3 * factor, name="con2") # to be done in parallel process

m.add_objective(x + 2 * y) # to be done in parallel process

with open("test.pkl", 'wb') as f:
    dill.dump(m, f)

with open("test.pkl", 'rb') as f:
    m2 = dill.load(f)

x.lower = 1 # or add whatever additional constraint
m.solve()

Which throws the following error:

Traceback (most recent call last):
  File "C:\github\test\linopy\test.py", line 29, in <module>
    m2 = dill.load(f)
         ^^^^^^^^^^^^
  File "C:\github\test\.venv\Lib\site-packages\dill\_dill.py", line 289, in load
    return Unpickler(file, ignore=ignore, **kwds).load()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\github\test\.venv\Lib\site-packages\dill\_dill.py", line 444, in load
    obj = StockUnpickler.load(self)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\github\test\.venv\Lib\site-packages\linopy\variables.py", line 1149, in __getattr__
    if name in self.data:
               ^^^^^^^^^
  File "C:\github\test\.venv\Lib\site-packages\linopy\variables.py", line 1149, in __getattr__
    if name in self.data:
               ^^^^^^^^^
  File "C:\github\test\.venv\Lib\site-packages\linopy\variables.py", line 1149, in __getattr__
    if name in self.data:
               ^^^^^^^^^
  [Previous line repeated 745 more times]
RecursionError: maximum recursion depth exceeded
FabianHofmann commented 3 weeks ago

@tburandt thanks for raising the issue. that's quite unfortunate. Pickling is not tested atm. how about storing it as netcdf in the meanwhile? should be as fast as pickling

lkstrp commented 3 weeks ago

This is most likely also the reason for deepcopy issues within PyPSA on some networks. I had a look into this a while ago, but this is a better starting point, so I will check again.

FabianHofmann commented 3 weeks ago

I have the vague feeling that the _getitem and getattribute overrides could be related to this...

tburandt commented 3 weeks ago

@FabianHofmann the problem is that multiprocessing and ProcessPoolExecutor (from concurrent.features) for example use pickle (or dill, i am not sure) to handover objects either from one process to another or back to the main process.

For storing the model manually, I can try netcdf. I might have some idea to solve my problem with that at least :)