TheSpaghettiDetective / obico-server

Obico is a community-built, open-source smart 3D printing platform used by makers, enthusiasts, and tinkerers around the world.
https://obico.io
GNU Affero General Public License v3.0
1.49k stars 297 forks source link

[BUG] Cannot compile Dockerfile.aarch64 #414

Open werdnum opened 3 years ago

werdnum commented 3 years ago

Describe the bug Cannot build Dockerfile.aarch64.

To Reproduce Steps to reproduce the behavior:

  1. On MacOS (arm64 architecture) with Docker desktop.
  2. Run docker build -f Dockerfile.aarch64 .

Screenshots Endless scroll of this:

--2021-03-23 08:42:10--  (try:15)  http://169.44.201.108:7002/jetpacks/4.2.1/libopencv_3.3.1-2-g31ccdfe11_arm64.deb
Connecting to 169.44.201.108:7002... failed: Connection timed out.
Retrying.

Hosting environment (please complete the following information): N/A

Additional context

It looks like the Dockerfile just wants to pull this deb file from some random IP address on a random port. Aside from the fact that it doesn't work, this is also kind of terrifying - there's no SSL so it's vulnerable to MITM remote code execution, and I have no way of knowing at a glance where this place is and why I should run code downloaded from it.

It looks like this deb file should come from some dependencies container or be downloaded from the NVidia SDK manager. Either way there should be some idea of what it is, what it's for, and where to get it from a somewhat reputable source.

kennethjiang commented 3 years ago

I don't think you can build images for ARM on a Mac. @raymondh2 let me know if I'm wrong here.

werdnum commented 3 years ago

Yes, you can. I use docker buildx build for that purpose, and in doing so I build multi-arch images that work on both amd64 and arm64

raymondh2 commented 3 years ago

You now need to download the Nvidia Jetson SDK to get that file and add it to the build path. As for running it on other ARM devices only the Jetson Nano is tested

raymondh2 commented 3 years ago

https://developer.nvidia.com/embedded/jetpack

werdnum commented 3 years ago

I got the deb directly from the repo (using apt download --print-uris nvidia-libopencv on the "bare metal"), but it now won't run because your compiled models under ml_api/bin are linked against OpenCV 3.3, but the version included with the SDK nowadays is OpenCV 4.x.

Here's the output:

$ docker run -it --rm -e DEBUG=1,FLASK_APP=server.py,HAS_GPU=1 werdnum/spaghetti-detective-ml-api:latest bash -c "gunicorn --bind 0.0.0.0:3333 --workers 1 wsgi"
[2021-03-28 08:25:04 +0000] [1] [INFO] Starting gunicorn 19.9.0
[2021-03-28 08:25:04 +0000] [1] [INFO] Listening at: http://0.0.0.0:3333 (1)
[2021-03-28 08:25:04 +0000] [1] [INFO] Using worker: sync
[2021-03-28 08:25:04 +0000] [8] [INFO] Booting worker with pid: 8
[2021-03-28 08:25:14 +0000] [8] [ERROR] Exception in worker process
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/gunicorn/arbiter.py", line 583, in spawn_worker
    worker.init_process()
  File "/usr/local/lib/python3.6/dist-packages/gunicorn/workers/base.py", line 129, in init_process
    self.load_wsgi()
  File "/usr/local/lib/python3.6/dist-packages/gunicorn/workers/base.py", line 138, in load_wsgi
    self.wsgi = self.app.wsgi()
  File "/usr/local/lib/python3.6/dist-packages/gunicorn/app/base.py", line 67, in wsgi
    self.callable = self.load()
  File "/usr/local/lib/python3.6/dist-packages/gunicorn/app/wsgiapp.py", line 52, in load
    return self.load_wsgiapp()
  File "/usr/local/lib/python3.6/dist-packages/gunicorn/app/wsgiapp.py", line 41, in load_wsgiapp
    return util.import_app(self.app_uri)
  File "/usr/local/lib/python3.6/dist-packages/gunicorn/util.py", line 350, in import_app
    __import__(module)
  File "/app/wsgi.py", line 1, in <module>
    import server
  File "/app/server.py", line 12, in <module>
    from lib.detection_model import load_net, detect
  File "/app/lib/detection_model.py", line 65, in <module>
    lib = CDLL(so_path, RTLD_GLOBAL)
  File "/usr/lib/python3.6/ctypes/__init__.py", line 348, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libopencv_highgui.so.3.3: cannot open shared object file: No such file or directory
[2021-03-28 08:25:14 +0000] [8] [INFO] Worker exiting (pid: 8)
[2021-03-28 08:25:14 +0000] [1] [INFO] Shutting down: Master
[2021-03-28 08:25:14 +0000] [1] [INFO] Reason: Worker failed to boot.

There don't seem to be any instructions, here or in 'darknet', to recompile those models. How did you turn your files under model/ into the so files, and how can I recompile those files against the newer version of OpenCV?

See: https://github.com/AlexeyAB/darknet/discussions/7551 for maybe a bit more detail.

FWIW here's my modified Dockerfile:

FROM nvcr.io/nvidia/l4t-base:r32.5.0
WORKDIR /app
EXPOSE 3333

ARG DEBIAN_FRONTEND=noninteractive

RUN apt update && \
    apt install -y ca-certificates

COPY ./nvidia-l4t-apt-source.list /etc/apt/sources.list.d/
COPY ./jetson-ota-public.asc /etc/apt/trusted.gpg.d/

RUN apt update && \
    apt install --no-install-recommends -y nvidia-opencv python3-opencv libopencv libopencv-dev libopencv-python opencv-licenses && \
    apt install --no-install-recommends -y libsm6 libxrender1 libfontconfig1 python3-pip python3-setuptools vim ffmpeg wget curl libgstreamer-plugins-base1.0-0 libgstreamer1.0-0 libgtk2.0-0 && \
    rm -rf /var/lib/apt/lists/*
RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 10 && \
    update-alternatives --install /usr/bin/pip pip /usr/bin/pip3 10
## Additional OpenCV dependencies usually installed by the CUDA Toolkit

ADD . /app
RUN cp bin/model_$(uname -m).so bin/model.so && \
    cp bin/model_gpu_$(uname -m).so bin/model_gpu.so

RUN pip install --upgrade pip
RUN pip install -r requirements.txt

RUN wget --quiet -O model/model.weights $(cat model/model.weights.url | tr -d '\r')
raymondh2 commented 3 years ago

The .so files are darklib compiled. They are not the ML models themselves. I'll have to recompile this when I get a chance to.

kennethjiang commented 3 years ago

@raymondh2 Is this a legitimate issue or not? If people have a reason (although I can't think of any) to build from Dockerfile.aarch64 and they can't, this is probably still a valid issue?

werdnum commented 3 years ago

I'm confused, why would there be no 'legitimate reason' to build from Dockerfile.aarch64? I want to run the server on a Jetson Nano. In the absence of any precompiled image for that purpose, I don't think I have any other choice but to build that image (or, to wit: if you don't want anyone to build that file, why is it even there?)

As far as recompiling the file, you could (and in fact I plan to) alter the Dockerfile to compile Darknet itself - that's the point of publishing source code after all.

kennethjiang commented 3 years ago

My apologies - I was thinking about Dockerfile.base. yes you need to be able to build Dockerfile.aarch64. I'm re-opening this issue.

shbatm commented 2 years ago

Following this discussion -- I'm also interested in trying to use this on Jetpack 4.6 and was trying to update the ml_api/Dockerfile.base_aarch64 to use eithernvcr.io/nvidia/l4t-ml:r32.6.1-py3ornvcr.io/nvidia/l4t-base:r32.6.1` as a base image. Looks like I'm running into the same issue as @werdnum with the pre-compiled modules wanting CUDA 10/libopencv 3.3.1 instead of the newer versions.

raymondh2 commented 2 years ago

Yep. That would require us to make changes to the build process. Which we currently do not have time to do.

nvtkaszpir commented 8 months ago

877 build on arm should be able to fix this - I tested arm build and even tested running arm container. But not tested it on mac m1 directly.