bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
22.99k stars 4.03k forks source link

Bazel does not run in emulated docker environment #11379

Closed travisgroth closed 4 years ago

travisgroth commented 4 years ago

Description of the problem / feature request:

When using binfmt/qemu docker emulation for arm64, bazel can build itself, but cannot run the resulting binary.

root@c33bb8911074:/bazel# ./output/bazel
Opening zip "/proc/self/exe": lseek(): Bad file descriptor
FATAL: Failed to open '/proc/self/exe' as a zip file: (error: 9): Bad file descriptor

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Dockerfile:

FROM arm64v8/ubuntu

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update \
    && apt-get -y install \
    libtool \
    cmake \
    automake \
    autoconf \
    make \
    ninja-build \
    curl \
    unzip \
    zip \
    wget \
    virtualenv \
    build-essential \
    openjdk-8-jdk \
    python

RUN mkdir /bazel

WORKDIR /bazel

RUN wget https://github.com/bazelbuild/bazel/releases/download/3.1.0/bazel-3.1.0-dist.zip
RUN unzip bazel*.zip

RUN env EXTRA_BAZEL_ARGS="--host_javabase=@local_jdk//:jdk" bash ./compile.sh
apt-get install qemu binfmt-support qemu-user-static docker.io
docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
docker build -t bazel -f Dockerfile
docker run -it bazel /bazel/output/bazel

What operating system are you running Bazel on?

Ubuntu 20.04 LTS
Docker  19.03.8
Qemu 4.2

What's the output of bazel info release?

Built from 3.1.0 dist tarball

If bazel info release returns "development version" or "(@non-git)", tell us how you built Bazel.

See repro steps. Compiled in container.

What's the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?

N/A

Have you found anything relevant by searching the web?

This looks related to https://github.com/bazelbuild/bazel/issues/7135 but that was closed with https://github.com/bazelbuild/bazel/pull/10761, and I'm not clear on why.

Any other information, logs, or outputs that you want to share?

It is worth noting that I do not observe this behavior on an older system:

Debian GNU/Linux 9.6
Docker 18.09.1
Qemu 2.8
philwo commented 4 years ago

@travisgroth Just to understand this correctly, as I never used binfmt/qemu docker emulation for arm64 - you run these steps on an x86_64 VM?

jiridanek commented 4 years ago

Works on my machine. I had to add apt install python, otherwise I got

ERROR: /bazel/src/BUILD:305:2: Executing genrule //src:embedded_tools_nojdk failed (Exit 127): bash failed: error executing command 
(cd /tmp/bazel_2fuARdr6/out/execroot/io_bazel && \
exec env - \
    PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
/bin/bash -c 'source external/bazel_tools/tools/genrule/genrule-setup.sh; bazel-out/host/bin/src/create_embedded_tools "bazel-out/aarch64-opt/bin/src/embedded_tools_nojdk.zip" bazel-out/aarch64-opt/bin/src/embedded_tools_nojdk.params')
Execution platform: //:default_host_platform
/usr/bin/env: 'python': No such file or directory
Target //src:bazel_nojdk failed to build
INFO: Elapsed time: 3364.010s, Critical Path: 1697.36s
INFO: 1107 processes: 960 local, 147 worker.
FAILED: Build did NOT complete successfully

Other than that, I was able to compile bazel this way and build bazel-buildfarm without problems on my machine (there aren't any targets in //, so that completes quickly, but I was actually able to build real targets afterwards).

root@e8207c0f000f:/bazel-buildfarm# /bazel/output/bazel build //:all
Starting local Bazel server and connecting to it...
... still trying to connect to local Bazel server after 10 seconds ...
... still trying to connect to local Bazel server after 20 seconds ...
... still trying to connect to local Bazel server after 30 seconds ...
INFO: Analyzed 0 targets (1 packages loaded, 0 targets configured).
INFO: Found 0 targets...
INFO: Elapsed time: 195.118s, Critical Path: 2.09s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action

I used

`boot.binfmt.emulatedSystems = [ "aarch64-linux" ];` in /etc/nixos/configuration.nix (but `multiarch/qemu-user-static` overwrites the setting, anyways)
host os: `Linux 5.4.41, NixOS, 20.03.1917.82b5f87fcc7 (Markhor)` on x86_64
Docker version 19.03.8
qemu 4.2.0

NB. I saw the error Opening zip "/proc/self/exe": lseek(): Bad file descriptor when I was trying to run the x86_64 bazel binary on i686 machine in qemu, https://github.com/bazelbuild/bazel/issues/1340#issuecomment-599987731

NB. That multiarch/qemu-user-static feels like magic!

       --persistent:  if yes, the interpreter is loaded when binfmt is
                      configured and remains in memory. All future uses
                      are cloned from the open file.
travisgroth commented 4 years ago

@philwo Yes - to be specific, I'm on a Linux x86_64 VM running the build inside an ARM64 container. This is using the multiarch/qemu-user-static docker image to facilitate emulation.

@jiridanek that's extremely strange. Can you share the sha of the qemu-user-static image you used?

jiridanek commented 4 years ago

@travisgroth I just downloaded it. I got excited about this possibility to get aarch64 quickly and conveniently.

$ docker images --digests
multiarch/qemu-user-static latest sha256:c4bbb826aff01aacd5c73bc29401751ef045d8f0ed05f067aab1ce1072db20f0   e623c77ba49d        4 weeks ago         130MB

edit: I can even set this up without using multiarch/qemu-user-static. The trick is that NixOS uses different file system hierarchy than Ubuntu, so I can mount the /nix/store into docker without clashes.

% docker run -v /nix/store:/nix/store -v /run/binfmt/aarch64-linux:/run/binfmt/aarch64-linux --rm -it arm64v8/ubuntu bash                   :(
root@39cd027c575e:/# uname -a
Linux 39cd027c575e 5.4.41 #1-NixOS SMP Thu May 14 05:58:30 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux
travisgroth commented 4 years ago

Yes..it's nice when it works :|

I'll go back over my setup again.

travisgroth commented 4 years ago

Okay, I was able to narrow it down to the qemu-user-static package being installed as part of the underlying OS. I think this can be closed as it is a runtime environment problem.

Working commands:

apt-get install docker.io
docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
docker build -t bazel -f Dockerfile .
$ docker run -it bazel /bazel/output/bazel
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
... still trying to connect to local Bazel server after 10 seconds ...
                                               [bazel release 3.1.0- (@non-git)]
Usage: bazel <command> <options> ...

Available commands:
  analyze-profile     Analyzes build profile data.
  aquery              Analyzes the given targets and queries the action graph.
  build               Builds the specified targets.
  canonicalize-flags  Canonicalizes a list of bazel options.
  clean               Removes output files and optionally stops the server.
  coverage            Generates code coverage report for specified test targets.
  cquery              Loads, analyzes, and queries the specified targets w/ configurations.
  dump                Dumps the internal state of the bazel server process.
  fetch               Fetches external repositories that are prerequisites to the targets.
  help                Prints help for commands, or the index.
  info                Displays runtime info about the bazel server.
  license             Prints the license of this software.
  mobile-install      Installs targets to mobile devices.
  print_action        Prints the command line args for compiling a file.
  query               Executes a dependency graph query.
  run                 Runs the specified target.
  shutdown            Stops the bazel server.
  sync                Syncs all repositories specified in the workspace file
  test                Builds and runs the specified test targets.
  version             Prints version information for bazel.

Getting more help:
  bazel help <command>
                   Prints help and options for <command>.
  bazel help startup_options
                   Options for the JVM hosting bazel.
  bazel help target-syntax
                   Explains the syntax for specifying targets.
  bazel help info-keys
                   Displays a list of keys used by the info command.
smorad commented 3 years ago

I'm running into this exact issue using a chroot with qemu-arm version 4.2.1 (Debian 1:4.2-3ubuntu6.10) on Ubuntu 20.04 using the latest release bazel-3.7.1-linux-arm64. Does anybody have any tips?

EDIT: strangely, following the docker instructions appears to have fixed qemu in my chroot (I am not using docker).

sm@pc:/mnt/buildroot/pi-crosscompile$ sudo docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
Unable to find image 'multiarch/qemu-user-static:latest' locally
latest: Pulling from multiarch/qemu-user-static
ea97eb0eb3ec: Pull complete 
c063ad2502a4: Pull complete 
ed8a5179ae11: Pull complete 
1ec39da9c97d: Pull complete 
c7bbe041eb55: Pull complete 
Digest: sha256:4644cb255715498525574007841a4abf79b663afef9464a95ea83970606f83bd
Status: Downloaded newer image for multiarch/qemu-user-static:latest
Setting /usr/bin/qemu-alpha-static as binfmt interpreter for alpha
Setting /usr/bin/qemu-arm-static as binfmt interpreter for arm
Setting /usr/bin/qemu-armeb-static as binfmt interpreter for armeb
Setting /usr/bin/qemu-sparc-static as binfmt interpreter for sparc
Setting /usr/bin/qemu-sparc32plus-static as binfmt interpreter for sparc32plus
Setting /usr/bin/qemu-sparc64-static as binfmt interpreter for sparc64
Setting /usr/bin/qemu-ppc-static as binfmt interpreter for ppc
Setting /usr/bin/qemu-ppc64-static as binfmt interpreter for ppc64
Setting /usr/bin/qemu-ppc64le-static as binfmt interpreter for ppc64le
Setting /usr/bin/qemu-m68k-static as binfmt interpreter for m68k
Setting /usr/bin/qemu-mips-static as binfmt interpreter for mips
Setting /usr/bin/qemu-mipsel-static as binfmt interpreter for mipsel
Setting /usr/bin/qemu-mipsn32-static as binfmt interpreter for mipsn32
Setting /usr/bin/qemu-mipsn32el-static as binfmt interpreter for mipsn32el
Setting /usr/bin/qemu-mips64-static as binfmt interpreter for mips64
Setting /usr/bin/qemu-mips64el-static as binfmt interpreter for mips64el
Setting /usr/bin/qemu-sh4-static as binfmt interpreter for sh4
Setting /usr/bin/qemu-sh4eb-static as binfmt interpreter for sh4eb
Setting /usr/bin/qemu-s390x-static as binfmt interpreter for s390x
Setting /usr/bin/qemu-aarch64-static as binfmt interpreter for aarch64
Setting /usr/bin/qemu-aarch64_be-static as binfmt interpreter for aarch64_be
Setting /usr/bin/qemu-hppa-static as binfmt interpreter for hppa
Setting /usr/bin/qemu-riscv32-static as binfmt interpreter for riscv32
Setting /usr/bin/qemu-riscv64-static as binfmt interpreter for riscv64
Setting /usr/bin/qemu-xtensa-static as binfmt interpreter for xtensa
Setting /usr/bin/qemu-xtensaeb-static as binfmt interpreter for xtensaeb
Setting /usr/bin/qemu-microblaze-static as binfmt interpreter for microblaze
Setting /usr/bin/qemu-microblazeel-static as binfmt interpreter for microblazeel
Setting /usr/bin/qemu-or1k-static as binfmt interpreter for or1k
$ chroot aarch64
$ ./bazel-3.7.1-linux-arm64
Extracting Bazel installation...
                                                           [bazel release 3.7.1]
Usage: bazel <command> <options> ...

Available commands:
  analyze-profile     Analyzes build profile data.
  aquery              Analyzes the given targets and queries the action graph.
  build               Builds the specified targets.
  canonicalize-flags  Canonicalizes a list of bazel options.
  clean               Removes output files and optionally stops the server.
  coverage            Generates code coverage report for specified test targets.
  cquery              Loads, analyzes, and queries the specified targets w/ configurations.
  dump                Dumps the internal state of the bazel server process.
  fetch               Fetches external repositories that are prerequisites to the targets.
  help                Prints help for commands, or the index.
  info                Displays runtime info about the bazel server.
  license             Prints the license of this software.
  mobile-install      Installs targets to mobile devices.
  print_action        Prints the command line args for compiling a file.
  query               Executes a dependency graph query.
  run                 Runs the specified target.
  shutdown            Stops the bazel server.
  sync                Syncs all repositories specified in the workspace file
  test                Builds and runs the specified test targets.
  version             Prints version information for bazel.
olekw commented 3 years ago

I'm seeing the same on a Debian sid riscv64 QEMU build of docker-image-debian-bootstrap. It sounds a bit like a partial regression of a QEMU bug from 2014. Unfortunately, I haven't had time to research it. I did check ls -l /proc/self/exe in that image and it correctly points to ls so it's not the exact same thing. @travisgroth, could you give us a little more information about how you resolved the issue you were having?

evalphobia commented 3 years ago

I'm running into same error on m1-chip Macbook and docker buildx to create amd64 image.

FROM golang:1.16 as builder

RUN apt-get update && \
    apt-get upgrade -y && \
    apt-get install -y \
      apt-transport-https \
      curl \
      g++ \
      gnupg \
      pkg-config \
      python \
      software-properties-common \
      unzip \
      zip \
      zlib1g-dev

RUN curl -LO https://github.com/bazelbuild/bazel/releases/download/4.0.0/bazel-4.0.0-linux-x86_64 && \
    mv ./bazel-4.0.0-linux-x86_64 /usr/local/bin/bazel && \
    chmod +x /usr/local/bin/bazel

# RUN sed -i 's@/proc/self/exe@/usr/bin/bazel@' /usr/local/bin/bazel
# RUN mv /usr/local/bin/bazel /usr/bin/bazel
RUN bazel version
$ docker buildx build --platform linux/amd64 -t my-bazel-image --progress --load .

WARN[0000] No output specified for docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load
#1 [internal] load build definition from Dockerfile
#1 sha256:6ba7a01ddddf53414fc03e0ce6cbb9af13d641fb8f1dc462bddcdc411f48e974
#1 transferring dockerfile: 30B 0.0s
#1 transferring dockerfile: 2.17kB 0.0s done
#1 DONE 0.4s

...

#7 [4/4] RUN bazel version
#7 sha256:9086214ac451e86e191f6ba254184c016c6f21a951216faa89a970e1d40fa7e0
#7 0.802 Opening zip "/proc/self/exe": lseek(): Bad file descriptor
#7 0.802 FATAL: Failed to open '/proc/self/exe' as a zip file: (error: 9): Bad file descriptor
#7 ERROR: executor failed running [/bin/sh -c bazel version]: exit code: 36

This error also happen on GCP's docker image on my environment.

FROM gcr.io/cloud-builders/bazel

RUN bazel version

ref: https://github.com/GoogleCloudPlatform/cloud-builders/tree/master/bazel

Then I tried the uncomment sed lines on Dockerfile, (ref: https://github.com/PINTO0309/Bazel_bin/issues/2 ) the installation process seemed to be started, but got segfault 11.

#10 [6/6] RUN bazel version
#10 sha256:2a2b74e6295f688d9f7b65cc031af03adc9ea1cb8b5c0b916cf482471d4233d3
#10 0.809 Extracting Bazel installation...
#10 29.07 qemu: uncaught target signal 11 (Segmentation fault) - core dumped
#10 29.08 Segmentation fault
#10 ERROR: executor failed running [/bin/sh -c bazel version]: exit code: 139

my environemnt

$ sw_vers
ProductName:    macOS
ProductVersion: 11.2.1
BuildVersion:   20D74

$ /usr/bin/arch
arm64

# Using Docker Desktop RC 3 (2021-04-01)
# https://docs.docker.com/docker-for-mac/apple-m1/
$ docker --version
Docker version 20.10.5, build 55c4c88

How to create libtensorflowlite_c.so without bazel

(This is not related with this bazel error, but it might help the people in same situation)

I just wanted to build TensorFlow Lite C library libtensorflowlite_c.so to run with Golang. Finally, I could run it with the dockerfile below,

FROM golang:1.16 as builder

RUN apt-get update && \
    apt-get upgrade -y && \
    apt-get install -y \
      apt-transport-https \
      curl \
      g++ \
      git \
      gnupg \
      pkg-config \
      python \
      software-properties-common \
      unzip \
      zip \
      zlib1g-dev

ENV TENSORFLOW_VERSION v2.2.0-rc3
ENV GO_TFLITE_VERSION 30f5e2a268d2eb053f758636609c5c379a3016b5

#== Download
WORKDIR /go/src/github.com
RUN mkdir mattn tensorflow && \
    git clone https://github.com/mattn/go-tflite.git ./mattn/go-tflite && \
    git clone --depth 1 -b ${TENSORFLOW_VERSION} https://github.com/tensorflow/tensorflow.git ./tensorflow/tensorflow && \
    cp mattn/go-tflite/Makefile.tflite tensorflow/tensorflow/tensorflow/lite/c/Makefile

#== Build
WORKDIR /go/src/github.com/tensorflow/tensorflow
RUN ./tensorflow/lite/tools/make/download_dependencies.sh && \
    make -f ./tensorflow/lite/tools/make/Makefile

WORKDIR /go/src/github.com/tensorflow/tensorflow/tensorflow/lite/c
RUN make

RUN mkdir -p /usr/local/include/tensorflow/lite/c && \
    cp libtensorflowlite_c.so /usr/local/lib/ && \
    cp *.h /usr/local/include/tensorflow/lite/c/ && \
    cp ../*.h /usr/local/include/tensorflow/lite/

FROM golang:1.16

COPY --from=builder /usr/local/lib/libtensorflowlite_c.so /usr/local/lib/libtensorflowlite_c.so
COPY --from=builder /usr/local/include/tensorflow /usr/local/include/tensorflow
ENV LD_LIBRARY_PATH $LD_LIBRARY_PATH:/usr/local/lib

# To avoid error 'could not determine kind of name for C.TfLiteInterpreterOptionsSetUseNNAPI'
ENV GO_TFLITE_VERSION 30f5e2a268d2eb053f758636609c5c379a3016b5

WORKDIR /go/src/github.com/mattn
RUN git clone https://github.com/mattn/go-tflite.git  && \
    cd go-tflite && \
    git checkout ${GO_TFLITE_VERSION}

WORKDIR /go/src/github.com/mattn/go-tflite/_example/fizzbuzz
RUN go run ./main.go
John-Brooks commented 3 years ago

I resolved this issue by re-running docker run --rm --privileged multiarch/qemu-user-static --reset -p yes.

Recently I created the docker user group and setup the permissions necessary to run docker without sudo (Post installation) and I'm wondering if that had something to do with it. I'm not sure, but the above got me up and running again.

keith commented 3 years ago

I think this issue, at least related to running in docker on M1 machines, still exists

emidln-imc commented 2 years ago

This is fixed in https://github.com/bazelbuild/bazel/pull/14391

infa-ddeore commented 2 years ago

I resolved this issue by re-running docker run --rm --privileged multiarch/qemu-user-static --reset -p yes.

Recently I created the docker user group and setup the permissions necessary to run docker without sudo (Post installation) and I'm wondering if that had something to do with it. I'm not sure, but the above got me up and running again.

it also helped me as well, i added ubuntu user into docker group to run docker commands without sudo