dusty-nv / jetson-containers

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
MIT License
2.09k stars 435 forks source link

Problem generating protobuf stubs in ONNX #356

Closed cahlen closed 8 months ago

cahlen commented 8 months ago

I'm trying to build this container where deepstream depends on protobuf:protobuf_cpp, but for some reason it's erroring out on the ONNX container build because of the protobuf extension. I should note that I'm building this from the dev branch also because of a comment I saw in another issue thread.

I'm unclear on how to fix this.

$ ./build.sh --name=my_container protobuf pytorch:2.1 opencv torchvision torchaudio tensorflow2 gstreamer deepstream
[...]
[...]
[...]
  running build_ext
  running cmake_build
  Using cmake args: ['/usr/local/bin/cmake', '-DPYTHON_INCLUDE_DIR=/usr/include/python3.8', '-DPYTHON_EXECUTABLE=/usr/bin/python3', '-DBUILD_ONNX_PYTHON=ON', '-DCMAKE_EXPORT_COMPILE_COMMANDS=ON', '-DONNX_NAMESPACE=onnx', '-DPY_EXT_SUFFIX=.cpython-38-aarch64-linux-gnu.so', '-DCMAKE_BUILD_TYPE=Release', '-DONNX_ML=1', '/tmp/pip-req-build-3z4v669m']
  CMake Deprecation Warning at CMakeLists.txt:2 (cmake_minimum_required):
    Compatibility with CMake < 3.5 will be removed from a future version of
    CMake.

    Update the VERSION argument <min> value or use a ...<max> suffix to tell
    CMake that the project does not need compatibility with older versions.

  -- The C compiler identification is GNU 9.4.0
  -- The CXX compiler identification is GNU 9.4.0
  -- Detecting C compiler ABI info
  -- Detecting C compiler ABI info - done
  -- Check for working C compiler: /usr/bin/cc - skipped
  -- Detecting C compile features
  -- Detecting C compile features - done
  -- Detecting CXX compiler ABI info
  -- Detecting CXX compiler ABI info - done
  -- Check for working CXX compiler: /usr/bin/c++ - skipped
  -- Detecting CXX compile features
  -- Detecting CXX compile features - done
  CMake Warning (dev) at CMakeLists.txt:106 (find_package):
    Policy CMP0148 is not set: The FindPythonInterp and FindPythonLibs modules
    are removed.  Run "cmake --help-policy CMP0148" for policy details.  Use
    the cmake_policy command to set the policy and suppress this warning.

  This warning is for project developers.  Use -Wno-dev to suppress it.

  -- Found PythonInterp: /usr/bin/python3 (found version "3.8.10")
  CMake Warning (dev) at CMakeLists.txt:108 (find_package):
    Policy CMP0148 is not set: The FindPythonInterp and FindPythonLibs modules
    are removed.  Run "cmake --help-policy CMP0148" for policy details.  Use
    the cmake_policy command to set the policy and suppress this warning.

  This warning is for project developers.  Use -Wno-dev to suppress it.

  -- Found PythonLibs: /usr/lib/aarch64-linux-gnu/libpython3.8.so (found version "3.8.10")
  -- Found Protobuf: /usr/local/lib/libprotobuf.so (found version "3.20.3")
  -- ONNX_PROTOC_EXECUTABLE: /usr/local/bin/protoc
  -- Protobuf_VERSION: 3.20.3
  Generated: /tmp/pip-req-build-3z4v669m/.setuptools-cmake-build/onnx/onnx-ml.proto
  Generated: /tmp/pip-req-build-3z4v669m/.setuptools-cmake-build/onnx/onnx-operators-ml.proto
  Generated: /tmp/pip-req-build-3z4v669m/.setuptools-cmake-build/onnx/onnx-data.proto
  -- Could NOT find pybind11 (missing: pybind11_DIR)
  CMake Deprecation Warning at third_party/pybind11/CMakeLists.txt:8 (cmake_minimum_required):
    Compatibility with CMake < 3.5 will be removed from a future version of
    CMake.

    Update the VERSION argument <min> value or use a ...<max> suffix to tell
    CMake that the project does not need compatibility with older versions.

  -- pybind11 v2.10.4
  CMake Warning (dev) at third_party/pybind11/tools/FindPythonLibsNew.cmake:98 (find_package):
    Policy CMP0148 is not set: The FindPythonInterp and FindPythonLibs modules
    are removed.  Run "cmake --help-policy CMP0148" for policy details.  Use
    the cmake_policy command to set the policy and suppress this warning.

  Call Stack (most recent call first):
    third_party/pybind11/tools/pybind11Tools.cmake:50 (find_package)
    third_party/pybind11/tools/pybind11Common.cmake:180 (include)
    third_party/pybind11/CMakeLists.txt:208 (include)
  This warning is for project developers.  Use -Wno-dev to suppress it.

  -- Found PythonInterp: /usr/bin/python3 (found suitable version "3.8.10", minimum required is "3.6")
  -- Found PythonLibs: /usr/lib/aarch64-linux-gnu/libpython3.8.so
  -- Performing Test HAS_FLTO
  -- Performing Test HAS_FLTO - Success
  --
  -- ******** Summary ********
  --   CMake version                     : 3.28.1
  --   CMake command                     : /usr/local/lib/python3.8/dist-packages/cmake/data/bin/cmake
  --   System                            : Linux
  --   C++ compiler                      : /usr/bin/c++
  --   C++ compiler version              : 9.4.0
  --   CXX flags                         :  -Wnon-virtual-dtor
  --   Build type                        : Release
  --   Compile definitions               : __STDC_FORMAT_MACROS
  --   CMAKE_PREFIX_PATH                 :
  --   CMAKE_INSTALL_PREFIX              : /usr/local
  --   CMAKE_MODULE_PATH                 :
  --
  --   ONNX version                      : 1.16.0
  --   ONNX NAMESPACE                    : onnx
  --   ONNX_USE_LITE_PROTO               : OFF
  --   USE_PROTOBUF_SHARED_LIBS          : OFF
  --   Protobuf_USE_STATIC_LIBS          : ON
  --   ONNX_DISABLE_EXCEPTIONS           : OFF
  --   ONNX_DISABLE_STATIC_REGISTRATION  : OFF
  --   ONNX_WERROR                       : OFF
  --   ONNX_BUILD_TESTS                  : OFF
  --   ONNX_BUILD_BENCHMARKS             : OFF
  --   ONNX_BUILD_SHARED_LIBS            :
  --   BUILD_SHARED_LIBS                 :
  --
  --   Protobuf compiler                 : /usr/local/bin/protoc
  --   Protobuf includes                 : /usr/local/include
  --   Protobuf libraries                : /usr/local/lib/libprotobuf.so
  --   BUILD_ONNX_PYTHON                 : ON
  --     Python version                :
  --     Python executable             : /usr/bin/python3
  --     Python includes               : /usr/include/python3.8
  -- Configuring done (1.7s)
  -- Generating done (0.0s)
  -- Build files have been written to: /tmp/pip-req-build-3z4v669m/.setuptools-cmake-build
  [  1%] Running gen_proto.py on onnx/onnx.in.proto
  Processing /tmp/pip-req-build-3z4v669m/onnx/onnx.in.proto
  Writing /tmp/pip-req-build-3z4v669m/.setuptools-cmake-build/onnx/onnx-ml.proto
  Writing /tmp/pip-req-build-3z4v669m/.setuptools-cmake-build/onnx/onnx-ml.proto3
  generating /tmp/pip-req-build-3z4v669m/.setuptools-cmake-build/onnx/onnx_pb.py
  [  2%] Running C++ protocol buffer compiler on /tmp/pip-req-build-3z4v669m/.setuptools-cmake-build/onnx/onnx-ml.proto
  /usr/local/lib/python3.8/dist-packages/google/protobuf/internal/api_implementation.py:110: UserWarning: Selected implementation cpp is not available.
    warnings.warn(
  Traceback (most recent call last):
    File "/tmp/pip-req-build-3z4v669m/.setuptools-cmake-build/tools/protoc-gen-mypy.py", line 28, in <module>
      import google.protobuf.descriptor_pb2 as d_typed
    File "/usr/local/lib/python3.8/dist-packages/google/protobuf/descriptor_pb2.py", line 5, in <module>
      from google.protobuf.internal import builder as _builder
    File "/usr/local/lib/python3.8/dist-packages/google/protobuf/internal/builder.py", line 42, in <module>
      from google.protobuf import reflection as _reflection
    File "/usr/local/lib/python3.8/dist-packages/google/protobuf/reflection.py", line 51, in <module>
      from google.protobuf import message_factory
    File "/usr/local/lib/python3.8/dist-packages/google/protobuf/message_factory.py", line 45, in <module>
      from google.protobuf import descriptor_pool
    File "/usr/local/lib/python3.8/dist-packages/google/protobuf/descriptor_pool.py", line 63, in <module>
      from google.protobuf import descriptor
    File "/usr/local/lib/python3.8/dist-packages/google/protobuf/descriptor.py", line 51, in <module>
      from google.protobuf.pyext import _message
  ImportError: cannot import name '_message' from 'google.protobuf.pyext' (/usr/local/lib/python3.8/dist-packages/google/protobuf/pyext/__init__.py)

  The above exception was the direct cause of the following exception:

  Traceback (most recent call last):
    File "/tmp/pip-req-build-3z4v669m/.setuptools-cmake-build/tools/protoc-gen-mypy.py", line 31, in <module>
      raise RuntimeError("Failed to generate mypy stubs") from e
  RuntimeError: Failed to generate mypy stubs
  --mypy_out: protoc-gen-mypy: Plugin failed with status code 1.
  make[2]: *** [CMakeFiles/gen_onnx_proto.dir/build.make:74: onnx/onnx-ml.pb.cc] Error 1
  make[1]: *** [CMakeFiles/Makefile2:107: CMakeFiles/gen_onnx_proto.dir/all] Error 2
  make: *** [Makefile:136: all] Error 2
  Traceback (most recent call last):
    File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
      main()
    File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/usr/local/lib/python3.8/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
      return _build_backend().build_wheel(wheel_directory, config_settings,
    File "/tmp/pip-build-env-tes3938h/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 404, in build_wheel
      return self._build_with_temp_dir(
    File "/tmp/pip-build-env-tes3938h/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 389, in _build_with_temp_dir
      self.run_setup()
    File "/tmp/pip-build-env-tes3938h/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 311, in run_setup
      exec(code, locals())
    File "<string>", line 321, in <module>
    File "/tmp/pip-build-env-tes3938h/overlay/lib/python3.8/site-packages/setuptools/__init__.py", line 103, in setup
      return distutils.core.setup(**attrs)
    File "/tmp/pip-build-env-tes3938h/overlay/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 185, in setup
      return run_commands(dist)
    File "/tmp/pip-build-env-tes3938h/overlay/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
      dist.run_commands()
    File "/tmp/pip-build-env-tes3938h/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
      self.run_command(cmd)
    File "/tmp/pip-build-env-tes3938h/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 963, in run_command
      super().run_command(command)
    File "/tmp/pip-build-env-tes3938h/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/tmp/pip-build-env-tes3938h/normal/lib/python3.8/site-packages/wheel/bdist_wheel.py", line 368, in run
      self.run_command("build")
    File "/tmp/pip-build-env-tes3938h/overlay/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
      self.distribution.run_command(command)
    File "/tmp/pip-build-env-tes3938h/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 963, in run_command
      super().run_command(command)
    File "/tmp/pip-build-env-tes3938h/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/tmp/pip-build-env-tes3938h/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/build.py", line 131, in run
      self.run_command(cmd_name)
    File "/tmp/pip-build-env-tes3938h/overlay/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
      self.distribution.run_command(command)
    File "/tmp/pip-build-env-tes3938h/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 963, in run_command
      super().run_command(command)
    File "/tmp/pip-build-env-tes3938h/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "<string>", line 258, in run
    File "/tmp/pip-build-env-tes3938h/overlay/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
      self.distribution.run_command(command)
    File "/tmp/pip-build-env-tes3938h/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 963, in run_command
      super().run_command(command)
    File "/tmp/pip-build-env-tes3938h/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "<string>", line 237, in run
    File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
      raise CalledProcessError(retcode, cmd)
  subprocess.CalledProcessError: Command '['/usr/local/bin/cmake', '--build', '.', '--', '-j', '12']' returned non-zero exit status 2.
  error: subprocess-exited-with-error

  × Building wheel for onnx (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> See above for output.

  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /usr/bin/python3 /usr/local/lib/python3.8/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py build_wheel /tmp/tmpiajrvrua
  cwd: /tmp/pip-req-build-3z4v669m
  Building wheel for onnx (pyproject.toml): finished with status 'error'
  ERROR: Failed building wheel for onnx
Failed to build onnx
ERROR: Could not build wheels for onnx, which is required to install pyproject.toml-based projects
The command '/bin/sh -c pip3 install --no-cache-dir --verbose git+https://github.com/onnx/onnx@${ONNX_VERSION}' returned a non-zero code: 1
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/nvidia/jetson-containers/jetson_containers/build.py", line 102, in <module>
    build_container(args.name, args.packages, args.base, args.build_flags, args.simulate, args.skip_tests, args.test_only, args.push, args.no_github_api)
  File "/home/nvidia/jetson-containers/jetson_containers/container.py", line 141, in build_container
    status = subprocess.run(cmd.replace(_NEWLINE_, ' '), executable='/bin/bash', shell=True, check=True)  
  File "/usr/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'DOCKER_BUILDKIT=0 docker build --network=host --tag my_container:r35.4.1-onnx --file /home/nvidia/jetson-containers/packages/onnx/Dockerfile --build-arg BASE_IMAGE=my_container:r35.4.1-cmake --build-arg ONNX_VERSION="main" /home/nvidia/jetson-containers/packages/onnx 2>&1 | tee /home/nvidia/jetson-containers/logs/20231222_010520/build/my_container_r35.4.1-onnx.txt; exit ${PIPESTATUS[0]}' returned non-zero exit status 1.
dusty-nv commented 8 months ago

@cahlen onnx likes to build its own version of protobuf that it provides as a submodule - does it work if you omit protobuf from your build.sh command?

cahlen commented 8 months ago

Thanks for your quick response @dusty-nv. Here is what I've tried after your comment,

  1. Completely cleaned my docker containers/images (docker system prune -a)
  2. Re-cloned the repo (without --depth=1) so that I could checkout dev
  3. Did the typical install over again (pip3 install -r requirements.txt) just in case.
  4. Issued: $ ./build.sh --name=my_container pytorch:2.1 opencv torchvision torchaudio tensorflow2 gstreamer deepstream without the protobuf layer as you requested.

It made it pretty far, here are the layers it ended up successfully building

debian:~/jetson-containers$ docker images
REPOSITORY                   TAG                       IMAGE ID       CREATED          SIZE
my_container         r35.4.1                   47451fccb1c2   14 minutes ago   17.3GB
my_container         r35.4.1-deepstream        47451fccb1c2   14 minutes ago   17.3GB
my_container         r35.4.1-tritonserver      1e0a80400896   17 minutes ago   16.4GB
my_container         r35.4.1-gstreamer         20dbad91ec4a   23 minutes ago   13.9GB
my_container         r35.4.1-tensorflow2       f8a828531b22   24 minutes ago   13.5GB
my_container         r35.4.1-protobuf_cpp      0b357613a2b4   26 minutes ago   12GB
my_container         r35.4.1-torchaudio        5066715ea516   34 minutes ago   11.5GB
my_container         r35.4.1-torchvision       f02b96837586   41 minutes ago   11.4GB
my_container         r35.4.1-opencv            4c1258031f48   43 minutes ago   11.3GB
my_container         r35.4.1-pytorch_2.1       a2bfd988d82c   44 minutes ago   11GB
my_container         r35.4.1-onnx              2f3938cd6b30   45 minutes ago   10GB
my_container         r35.4.1-cmake             6b6aa3c435cf   50 minutes ago   9.96GB
my_container         r35.4.1-numpy             0a910d095e80   51 minutes ago   9.9GB
my_container         r35.4.1-python            a0b6c760ee6c   51 minutes ago   9.85GB
my_container         r35.4.1-tensorrt          a0b6c760ee6c   51 minutes ago   9.85GB
my_container         r35.4.1-build-essential   00409d96cad4   52 minutes ago   9.76GB
my_container         r35.4.1-cuda              00409d96cad4   52 minutes ago   9.76GB
my_container         r35.4.1-cudnn             00409d96cad4   52 minutes ago   9.76GB
nvcr.io/nvidia/l4t-jetpack   r35.4.1                   5c923ac521a3   5 months ago     9.71GB

So it looks like it built all the containers and the final container, however it still fails on that ONNX test. Here is the last log

-- Testing container my_container:r35.4.1 (onnx/test.py)

docker run -t --rm --runtime=nvidia --network=host \
--volume /home/debian/jetson-containers/packages/onnx:/test \
--volume /home/debian/jetson-containers/data:/data \
--workdir /test \
my_container:r35.4.1 \
/bin/bash -c 'python3 test.py' \
2>&1 | tee /home/debian/jetson-containers/logs/20231222_114450/test/my_container_r35.4.1_test.py.txt; exit ${PIPESTATUS[0]}

testing onnx...
/usr/local/lib/python3.8/dist-packages/google/protobuf/internal/api_implementation.py:87: UserWarning: Selected implementation cpp is not available.
  warnings.warn(
Traceback (most recent call last):
  File "test.py", line 4, in <module>
    import onnx
  File "/usr/local/lib/python3.8/dist-packages/onnx/__init__.py", line 75, in <module>
    from onnx import serialization
  File "/usr/local/lib/python3.8/dist-packages/onnx/serialization.py", line 16, in <module>
    import google.protobuf.json_format
  File "/usr/local/lib/python3.8/dist-packages/google/protobuf/json_format.py", line 30, in <module>
    from google.protobuf.internal import type_checkers
  File "/usr/local/lib/python3.8/dist-packages/google/protobuf/internal/type_checkers.py", line 28, in <module>
    from google.protobuf.internal import decoder
  File "/usr/local/lib/python3.8/dist-packages/google/protobuf/internal/decoder.py", line 64, in <module>
    from google.protobuf.internal import encoder
  File "/usr/local/lib/python3.8/dist-packages/google/protobuf/internal/encoder.py", line 48, in <module>
    from google.protobuf.internal import wire_format
  File "/usr/local/lib/python3.8/dist-packages/google/protobuf/internal/wire_format.py", line 13, in <module>
    from google.protobuf import descriptor
  File "/usr/local/lib/python3.8/dist-packages/google/protobuf/descriptor.py", line 28, in <module>
    from google.protobuf.pyext import _message
ImportError: cannot import name '_message' from 'google.protobuf.pyext' (/usr/local/lib/python3.8/dist-packages/google/protobuf/pyext/__init__.py)
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/debian/jetson-containers/jetson_containers/build.py", line 102, in <module>
    build_container(args.name, args.packages, args.base, args.build_flags, args.simulate, args.skip_tests, args.test_only, args.push, args.no_github_api)
  File "/home/debian/jetson-containers/jetson_containers/container.py", line 160, in build_container
    test_container(name, package, simulate)
  File "/home/debian/jetson-containers/jetson_containers/container.py", line 320, in test_container
    status = subprocess.run(cmd.replace(_NEWLINE_, ' '), executable='/bin/bash', shell=True, check=True)
  File "/usr/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'docker run -t --rm --runtime=nvidia --network=host --volume /home/debian/jetson-containers/packages/onnx:/test --volume /home/debian/jetson-containers/data:/data --workdir /test my_container:r35.4.1 /bin/bash -c 'python3 test.py' 2>&1 | tee /home/debian/jetson-containers/logs/20231222_114450/test/my_container_r35.4.1_test.py.txt; exit ${PIPESTATUS[0]}' returned non-zero exit status 1.

Since the final container looks to have been built, even thought the test failed, I'm going to try to use it. I'll update with results.

cahlen commented 8 months ago

It seems like an order thing. Switched back to master and was able to successfully excercise this build

$ ./build.sh --name=my_container deepstream pytorch:2.1 opencv torchaudio torchvision tensorflow2

instead of deepstream at the end, I put it up front, and the ordering of the container layers seems to work like this.

dusty-nv commented 8 months ago

OK interesting, thanks @cahlen. It would appear the order in this case did in fact matter. The protobuf stuff is tricky to figure out sometimes. But hey if it works! Glad that you got it built 👍