dssg / triage

General Purpose Risk Modeling and Prediction Toolkit for Policy and Social Good Problems
Other
182 stars 61 forks source link

Support Postgres 13 #863

Closed thcrock closed 2 years ago

nanounanue commented 2 years ago

What's the problem with Postgres 13?

thcrock commented 2 years ago

Good question for @shaycrk and @rayidghani

shaycrk commented 2 years ago

@nanounanue -- I tried to stand up infrastructure for our class using postgres 13 and when running triage I got an error that ForkingPickler couldn't serialize the engine here, despite using create_engine() from triage. Downgrading the postgres 11 (both the database version and the client version) fixed the issue, though note I haven't tried with postgres 12.

The way SerializableDbEngine works, I can't see any good reason why the underlying postgres version/drive should cause problems here, so I think this definitely needs a bit more investigation. I believe @tweddielin is taking a look here after our last triage meeting as well.

tweddielin commented 2 years ago

I haven't been able to replicate but will give it a try this weekend

nanounanue commented 2 years ago

Maybe is the version of psycopg2-binary or maybe the version of libpsql (the Linux driver client)

On Fri, Sep 17, 2021, at 16:49, Kit Rodolfa wrote:

@nanounanue https://github.com/nanounanue -- I tried to stand up infrastructure for our class using postgres 13 and when running triage I got an error that ForkingPickler couldn't serialize the engine here https://github.com/dssg/triage/blob/683cda06905abe364f4b13d6058c3eae54f2df3f/src/triage/experiments/multicore.py#L31, despite using create_engine() from triage. Downgrading the postgres 11 (both the database version and the client version) fixed the issue, though note I haven't tried with postgres 12.

The way SerializableDbEngine works, I can't see any good reason why the underlying postgres version/drive should cause problems here, so I think this definitely needs a bit more investigation. I believe @tweddielin https://github.com/tweddielin is taking a look here after our last triage meeting as well.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dssg/triage/issues/863#issuecomment-922100379, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADYXQCAGMZXHMPZ66WR77LUCOZYHANCNFSM5DS5W4WQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

shaycrk commented 2 years ago

Yeah, my guess is probably libpsql as I didn't have to reinstall psycopg2 to get things working upon downgrading (though I don't understand enough about the interaction between those two behind the scenes...)

tweddielin commented 2 years ago

@shaycrk sorry a bit late. I tried to replicate the error but I couldn't. I setup a dirtyduck db with postgres 13 in docker.

food=# show server_version;
-[ RECORD 1 ]--+-------------------------------
server_version | 13.4 (Debian 13.4-1.pgdg110+1)

First, I ran triage experiment --n-db-processes=2 example/dirtyduck/experiments/dirty-duckling.yaml through without any errors and confirmed that it's using MulitcoreExperiment by INFO Experiment will run in multi core mode using 1 processes and 2 db processes.

Then, I ran triage experiment --n-processes=2 --n-db-processes=2 example/dirtyduck/experiments/eis_01.yaml and confirmed that it's using MultiCoreExperiment by the log message INFO Experiment will run in multi core mode using 2 processes and 2 db processes. No error happened.

I'm wondering what the details of infrastructure you were using are?

shaycrk commented 2 years ago

@tweddielin -- I just set up fresh EC2 (Ubuntu 20.04) and RDS (Postgres 13.3-R1) machines and was able to replicate the issue there, so it's possible it's something about that environment particularly (or at least how I'm setting it up).

Here's the simplified thing I'm running:

#!/usr/bin/env python

import yaml

from sqlalchemy.engine.url import URL
from triage.util.db import create_engine

from multiprocessing.reduction import ForkingPickler

# creating database engine
dbfile = 'database.yaml'

dbconfig = yaml.load(open(dbfile))
db_url = URL(
            'postgres',
            host=dbconfig['host'],
            username=dbconfig['user'],
            database=dbconfig['db'],
            password=dbconfig['pass'],
            port=dbconfig['port'],
        )

db_engine = create_engine(db_url)

ForkingPickler.dumps(db_engine)

Which is generating this error:

Traceback (most recent call last):
  File "test_pickle_engine.py", line 25, in <module>
    ForkingPickler.dumps(db_engine)
  File "/usr/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class 'SerializableDbEngine'>: import of module <property object at 0x7f3d10e1b2f0> failed

What are the best next steps here? If any of you want to send me a public key, I could give you access to that server if you want to look at things there...

cc @thcrock and @nanounanue for their thoughts as well

tweddielin commented 2 years ago

yeah I'll send you the public key to use that environment setup.

tweddielin commented 2 years ago

Also there might be another workaround by only forking the db credential information instead of the db_engine object itself and have the create_db function in the function we want to parallelize.

shaycrk commented 2 years ago

@tweddielin @thcrock @nanounanue -- Just moving the conversation back here from slack. Eddie has been working on the ubuntu instance I set up and still encountering the same issues (did you hit them with postgres 12 as well?) without a very clear underlying cause, but I think still investigating a little bit?

I guess one question for the group is how the alternative solution @tweddielin suggested (just forking the credentials themselves and creating a new engine in each process) would interact with the n_db_processes parameter? Would it still be possible to control the total number of connections to the database? If we do have to lose that functionality, how big of an issue is it?

shaycrk commented 2 years ago

@tweddielin -- Just wanted to check back in on this thread and see if you'd had more luck with digging into the postgres 12/13 stuff?

tweddielin commented 2 years ago

@shaycrk -- what is the best way to test new changes of triage with postgres13/12 on the server you launched? Can we have something in the postgres 13/12 to run some toy example of triage (dirty duckling?)

shaycrk commented 2 years ago

Hey @tweddielin -- I think the easiest might be to load up the same donors choose example I used for the colab notebook. There's a postgres dump file in a public S3 bucket you can just curl and load and use the same experiment config that's in the notebook:

https://colab.research.google.com/github/dssg/triage/blob/kit_colab_triage/example/colab/colab_triage.ipynb

Let me know if you run into any trouble or need me to do anything to set stuff up there!

tweddielin commented 2 years ago

Add some slack discussion back here.

It turned out that the what made SerializableDbEngine is wrapt.ObjectProxy. It's hard to figure out what exactly the property is not serializable, but I know it's a property of wrapt.ObjectProxy by tracing the memory address back to python object like this (there might be a better way)

In [2]: from multiprocessing.reduction import ForkingPickler

In [3]: ForkingPickler.dumps(db_engine)
---------------------------------------------------------------------------
PicklingError                             Traceback (most recent call last)
<ipython-input-3-53b6f23ef378> in <module>
----> 1 ForkingPickler.dumps(db_engine)

/usr/lib/python3.7/multiprocessing/reduction.py in dumps(cls, obj, protocol)
     49     def dumps(cls, obj, protocol=None):
     50         buf = io.BytesIO()
---> 51         cls(buf, protocol).dump(obj)
     52         return buf.getbuffer()
     53 

PicklingError: Can't pickle <class 'SerializableDbEngine'>: import of module <property object at 0x7fa52c30def0> failed

In [4]: import ctypes

In [5]: obj = ctypes.cast(0x7fa52c30def0, ctypes.py_object).value

In [6]: obj
Out[6]: <property at 0x7fa52c30def0>

In [7]: obj.fget
Out[7]: <function wrapt.wrappers._ObjectProxyMethods.__module__(self)>

In [8]: obj.fset
Out[8]: <function wrapt.wrappers._ObjectProxyMethods.__module__(self, value)>

After I updated wrapt version to the latest 1.13.3, it seems to work like usual on the sandbox machine where we found the error. However, the mystery is still there. I still can't reproduce the same error with my local machine or docker instance with current wrapt==1.12.1 version.

shaycrk commented 2 years ago

Closed via #882