dropbox / dbx_build_tools

Dropbox's Bazel rules and tools
Other
209 stars 37 forks source link

Hermeticity bug: Package setup can find host libraries (leading to import failures) #31

Open jake-arkinstall opened 3 years ago

jake-arkinstall commented 3 years ago

When trying to install numpy, I found that it failed to import because it depended on libraries that it could not access - CBLAS and LAPACK - resulting in undefined symbol errors (sometimes cblas_dot, sometimes saxpy_, sometimes _gfortran_concat_string, depending on what you have installed, and what you have tried to provide via Bazel dependencies). The numpy wheel comes bundled with the necessary libraries but bypasses their installation if it finds implementations on the host system - allowing users to utilise alternative implementations suited to their needs.

A minimal reproduction in a docker image showed no problem - numpy worked fine - so the same code worked or borked depending on the host - indicative of a hermeticity issue. To verify, I installed openblas to the docker image before the numpy package is installed, and this gives it undefined symbol errors when trying to import numpy.

I've attached a MWE. Inside is a simple project that simply imports numpy, prints hello world, and outputs a simple mean to demonstrate numpy doing something. This is done within a docker environment to isolate the build from the host machine.

There are two dockerfiles: working.Dockerfile and broken.Dockerfile. The only difference between the two is that broken.Dockerfile installs libopenblas-dev before running bazel build.

To reproduce, download dbx_build_tools_bug.tar.gz and:

tar -xzvf dbx_build_tools_bug.tar.gz && cd dbx_build_tools_bug
docker build --network=host -t dbx_docker_broken -f broken.Dockerfile .
docker build --network=host -t dbx_docker_working -f working.Dockerfile .
docker run -it dbx_docker_working
docker run -it dbx_docker_broken

Running the working docker image outputs:

Starting local Bazel server and connecting to it...
INFO: Analyzed target //usage:example (39 packages loaded, 3812 targets configured).
INFO: Found 1 target...
Target //usage:example up-to-date:
  bazel-bin/usage/example
INFO: Elapsed time: 4.154s, Critical Path: 0.16s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Build completed successfully, 1 total action
Hello, world!
3.0

And running the 'broken' docker image outputs:

Starting local Bazel server and connecting to it...
INFO: Analyzed target //usage:example (39 packages loaded, 3812 targets configured).
INFO: Found 1 target...
Target //usage:example up-to-date:
  bazel-bin/usage/example
INFO: Elapsed time: 4.288s, Critical Path: 0.28s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Build completed successfully, 1 total action
Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/b570b5ccd0454dc9af9f65ab1833764d/execroot/__main__/bazel-out/k8-fastbuild/bin/usage/example.runfiles/__main__/usage/example-wrapper.py", line 42, in <module>
    exec(code, module.__dict__)
  File "/root/.cache/bazel/_bazel_root/b570b5ccd0454dc9af9f65ab1833764d/execroot/__main__/bazel-out/k8-fastbuild/bin/usage/example.runfiles/__main__/usage/example.py", line 1, in <module>
    import numpy
  File "/root/.cache/bazel/_bazel_root/b570b5ccd0454dc9af9f65ab1833764d/execroot/__main__/bazel-out/k8-fastbuild/bin/usage/example.runfiles/__main__/pip/numpy/numpy-cpython-38/lib/numpy/__init__.py", line 148, in <module>
    from . import lib
  File "/root/.cache/bazel/_bazel_root/b570b5ccd0454dc9af9f65ab1833764d/execroot/__main__/bazel-out/k8-fastbuild/bin/usage/example.runfiles/__main__/pip/numpy/numpy-cpython-38/lib/numpy/lib/__init__.py", line 25, in <module>
    from .index_tricks import *
  File "/root/.cache/bazel/_bazel_root/b570b5ccd0454dc9af9f65ab1833764d/execroot/__main__/bazel-out/k8-fastbuild/bin/usage/example.runfiles/__main__/pip/numpy/numpy-cpython-38/lib/numpy/lib/index_tricks.py", line 12, in <module>
    import numpy.matrixlib as matrixlib
  File "/root/.cache/bazel/_bazel_root/b570b5ccd0454dc9af9f65ab1833764d/execroot/__main__/bazel-out/k8-fastbuild/bin/usage/example.runfiles/__main__/pip/numpy/numpy-cpython-38/lib/numpy/matrixlib/__init__.py", line 4, in <module>
    from .defmatrix import *
  File "/root/.cache/bazel/_bazel_root/b570b5ccd0454dc9af9f65ab1833764d/execroot/__main__/bazel-out/k8-fastbuild/bin/usage/example.runfiles/__main__/pip/numpy/numpy-cpython-38/lib/numpy/matrixlib/defmatrix.py", line 11, in <module>
    from numpy.linalg import matrix_power
  File "/root/.cache/bazel/_bazel_root/b570b5ccd0454dc9af9f65ab1833764d/execroot/__main__/bazel-out/k8-fastbuild/bin/usage/example.runfiles/__main__/pip/numpy/numpy-cpython-38/lib/numpy/linalg/__init__.py", line 73, in <module>
    from .linalg import *
  File "/root/.cache/bazel/_bazel_root/b570b5ccd0454dc9af9f65ab1833764d/execroot/__main__/bazel-out/k8-fastbuild/bin/usage/example.runfiles/__main__/pip/numpy/numpy-cpython-38/lib/numpy/linalg/linalg.py", line 33, in <module>
    from numpy.linalg import lapack_lite, _umath_linalg
ImportError: /root/.cache/bazel/_bazel_root/b570b5ccd0454dc9af9f65ab1833764d/execroot/__main__/bazel-out/k8-fastbuild/bin/usage/example.runfiles/__main__/pip/numpy/numpy-cpython-38/lib/numpy/linalg/lapack_lite.cpython-38-x86_64-linux-gnu.so: undefined symbol: _gfortran_concat_string
benjaminp commented 3 years ago

It is assumed that the system has libgfortran available.

jake-arkinstall commented 3 years ago

That isn't the case. You can try that out on the docker image by running apt install gfortran and/or apt install libgfortran5 before bazel build. It still can't link with libgfortran because gfortran isn't in path provided to the runtime. I've also tried providing it with libgfortran within bazel, but to no avail - not that this would solve the issue, because the developer needs to start adding dependencies that depend on whether X, Y or Z is available on the target system.

The underlying issue is that the runtime environment is hermetic but the setup environment is leaky - as far as I can tell, vpip is given access to system libraries, so any setup process that searches for available resources in the path that it has access to is prone to creating dependencies that aren't be available at runtime.

benjaminp commented 3 years ago

The underlying issue is that the runtime environment is hermetic but the setup environment is leaky - as far as I can tell, vpip is given access to system libraries, so any setup process that searches for available resources in the path that it has access to is prone to creating dependencies that aren't be available at runtime.

Yes, that's a well-known Bazel sandboxing hole: https://github.com/bazelbuild/bazel/issues/7313

jake-arkinstall commented 3 years ago

This being the case, I'm wondering two things:

  1. In Numpy, at least, a configuration can be provided using a site.cfg file. Is there a chance of a mid-step before download and installation that allows for swapping out configuration files before the installation?

  2. If setup can't be hermetic, then it's likely that other packages are also impacted - and their respective solutions may be very different. How about a wiki page which users can post approaches that they've found for setting up packages successfully?

benjaminp commented 3 years ago

We use a site.cfg to point scipy to our copy of OpenBLAS.

We avoid the hermeticity problem by building everything with a minimal sandbox root fs. https://github.com/bazelbuild/bazel/issues/6994#issuecomment-457438551

vicecea commented 2 years ago

The underlying issue is that the runtime environment is hermetic but the setup environment is leaky - as far as I can tell, vpip is given access to system libraries, so any setup process that searches for available resources in the path that it has access to is prone to creating dependencies that aren't be available at runtime.

Yes, that's a well-known Bazel sandboxing hole: bazelbuild/bazel#7313

The upstream issue has been resolved, can this be closed?