alan-turing-institute / sqlsynthgen

Synthetic data for SQL databases
MIT License
11 stars 0 forks source link

pyyaml/cython error when installing with poetry install #120

Closed Iain-S closed 10 months ago

Iain-S commented 1 year ago

As of a recent release of cython v3, poetry install will fail with an error such as:

• Installing python-dotenv (1.0.0) • Installing pyyaml (5.4.1): Failed ... ChefBuildError ... raise AttributeError(attr) AttributeError: cython_sources ... Note: This error originates from the build backend, and is likely not a problem with poetry but with pyyaml (5.4.1) not supporting PEP 517 builds. You can verify this by running 'pip wheel --use-pep517 "pyyaml (==5.4.1)"'.

We believe that this is because pyyaml 5.x.x cannot be built with the latest release of cython. Although poetry locks the versions of dependencies, it seems that it does not lock the versions of build dependencies.

The main pyyaml issue is here where the maintainer sounds unlikely to do a 5.4.2 release. We could update to pyyaml 6.x.x but that would require us to upgrade other dependencies and, ultimately, use sqlalchemy v2, which is currently incompatible with smartnoise SQL. See #104 for more.

Iain-S commented 1 year ago

A potential workaround is to manually install cython < 3 and use pips --no-build-isolation option to control the version of cython used to build pyyaml:

cd sqlsynthgen
poetry run pip install cython<3
poetry run pip install wheel
poetry run pip install --no-build-isolation pyyaml==5.4.1
poetry install

Full credit to this comment for coming up with the workaround.

Iain-S commented 11 months ago

Closing this as we think it has been fixed by the upgrade to SQL Alchemy v2.

cptanalatriste commented 10 months ago

I've seen the same error when running pip install sqlsynthgen from a Dockerfile. Luckily, the workaround also fixed it in this case.

cptanalatriste commented 10 months ago

I'm not sure if the reason is the workaround, but after a successful install, running sqlsynthgen make-tables produce the following error: libopendp.so: cannot open shared object file: No such file or directory.

Here's the stack trace. Seems related to opendp (and by extension smartnoise-sql).

Traceback (most recent call last):
  File "/opt/conda/bin/sqlsynthgen", line 5, in <module>
    from sqlsynthgen.main import app
  File "/opt/conda/lib/python3.10/site-packages/sqlsynthgen/main.py", line 17, in <module>
    from sqlsynthgen.make import make_src_stats, make_table_generators, make_tables_file
  File "/opt/conda/lib/python3.10/site-packages/sqlsynthgen/make.py", line 13, in <module>
    import snsql
  File "/opt/conda/lib/python3.10/site-packages/snsql/__init__.py", line 1, in <module>
    from .connect import from_connection, from_df
  File "/opt/conda/lib/python3.10/site-packages/snsql/connect.py", line 1, in <module>
    from .sql.private_reader import PrivateReader
  File "/opt/conda/lib/python3.10/site-packages/snsql/sql/__init__.py", line 1, in <module>
    from .private_reader import PrivateReader
  File "/opt/conda/lib/python3.10/site-packages/snsql/sql/private_reader.py", line 6, in <module>
    from snsql.sql.odometer import OdometerHeterogeneous
  File "/opt/conda/lib/python3.10/site-packages/snsql/sql/odometer.py", line 2, in <module>
    from snsql.sql.privacy import Privacy
  File "/opt/conda/lib/python3.10/site-packages/snsql/sql/privacy.py", line 2, in <module>
    from ._mechanisms import *
  File "/opt/conda/lib/python3.10/site-packages/snsql/sql/_mechanisms/__init__.py", line 1, in <module>
    from .laplace import Laplace
  File "/opt/conda/lib/python3.10/site-packages/snsql/sql/_mechanisms/laplace.py", line 3, in <module>
    from opendp.transformations import make_bounded_sum, make_clamp
  File "/opt/conda/lib/python3.10/site-packages/opendp/__init__.py", line 1, in <module>
    from opendp.mod import Transformation, Measurement, OpenDPException, UnknownTypeException
  File "/opt/conda/lib/python3.10/site-packages/opendp/mod.py", line 4, in <module>
    from opendp._lib import AnyMeasurement, AnyTransformation, AnyDomain, AnyMetric, AnyMeasure, AnyFunction
  File "/opt/conda/lib/python3.10/site-packages/opendp/_lib.py", line 33, in <module>
    lib = ctypes.cdll.LoadLibrary(os.path.join(lib_dir, lib_name))
  File "/opt/conda/lib/python3.10/ctypes/__init__.py", line 452, in LoadLibrary
    return self._dlltype(name)
  File "/opt/conda/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)

Not sure if messing with PyYaml/Cython somehow messed up with OpenDP

cptanalatriste commented 10 months ago

I think the workaround is the problem: installing opendp in isolation shows no problem. And it depends on pyyaml==6.0:

jovyan@8bcfc2dffc6c:/workspaces/sqlsynthgen-health-demo$ python
Python 3.10.11 | packaged by conda-forge | (main, May 10 2023, 18:47:07) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import opendp
>>> exit()
jovyan@8bcfc2dffc6c:/workspaces/sqlsynthgen-health-demo$ pip show opendp
Name: opendp
Version: 0.8.0
Summary: Python bindings for the OpenDP Library
Home-page: https://opendp.org
Author: The OpenDP Project
Author-email: info@opendp.org
License: UNKNOWN
Location: /opt/conda/lib/python3.10/site-packages
Requires: 
Required-by: 
jovyan@8bcfc2dffc6c:/workspaces/sqlsynthgen-health-demo$ pip show pyyaml
Name: PyYAML
Version: 6.0
Summary: YAML parser and emitter for Python
Home-page: https://pyyaml.org/
Author: Kirill Simonov
Author-email: xi@resolvent.net
License: MIT
Location: /opt/conda/lib/python3.10/site-packages
Requires: 
Required-by: bokeh, dask, distributed, jupyter-events
jovyan@8bcfc2dffc6c:/workspaces/sqlsynthgen-health-demo$ 
mhauru commented 10 months ago

Sorry I didn't fully follow. How can I reproduce the problem that comes up when you run pip install sqlsynthgen?

cptanalatriste commented 10 months ago

@mhauru , this is what I did on a Linux Docker container:

  1. Install SSG using pip install sqlsynthgen
  2. Fails, because of the Cython issue.
  3. Use @Iain-S 's workaround, to get pip install sqlsynthgen working.
  4. Now, when using sqlsynthgen make-tables you get an opendp error. I suspect the cause of this error is the workaround.
mhauru commented 10 months ago

Can reproduce. I think the issue is just that we haven't made a new pypi release in a long while. Whooops... Should definitely do that.

mhauru commented 10 months ago

Doing that now, see #164

mhauru commented 10 months ago

Now fixed by the release of 0.4.0. Thanks Carlos for spotting, this was a significant oversight to fix.