Sample Integration of Raspberry Pi (Models 1-3) GPU Support via OpenCL

Dall127 commented 10 months ago

I would like to see GPU acceleration on Raspberry Pi (models 1-3), and I've provided a sample (but untested) docker script that installs custom OpenCL & darknet->OpenCL translation layers & an Onnx OpenCL installation.

Basically, I mimicked the install provided for the Nvidia Jetson, while linking in different libraries that enable OpenCL support on the Raspberry Pi (again, only 1-3 as I'm specifically hoping to run this on a model 3a+, but I'm sure that there's a way to get things running on the Pi 4 and 5 via building everything for Vulkan), then another custom install of Darknet that supports OpenCL, and a OpenCL Onnx runtime, and tying it all together. I don't have the Raspberry Pi yet to test it on, but I figured there is probably someone else out there that would also like to see this for their octoprint server that could test this out and get it in a runnable state if it's not already.

# Raspberry Pi GPU Accelerated Darnet Image
FROM debian:buster-slim as darknet_builder_rpi

# Set environment to non-interactive (this prevents some prompts)
ENV DEBIAN_FRONTEND=noninteractive

# Custom OpenCL installation
RUN apt update && apt upgrade -y

# Install base dependencies and Clang
RUN apt-get update && \
    apt-get install -y build-essential cmake git clang clang-format clang-tidy \
    ocl-icd-opencl-dev ocl-icd-dev opencl-headers clinfo libraspberrypi-dev ca-certificates

# Install OpenCL from GitHub
RUN mkdir -p /opencl && cd /opencl && \
    git clone https://github.com/doe300/VC4CLStdLib.git && \
    git clone https://github.com/doe300/VC4CL.git && \
    git clone https://github.com/doe300/VC4C.git

# Build and install VC4CLStdLib
RUN cd /opencl/VC4CLStdLib && mkdir build && cd build && \
    cmake .. && make && sudo make install && sudo ldconfig

# Build and install VC4C
RUN cd /opencl/VC4C && mkdir build && cd build && \
    cmake .. && make && sudo make install && sudo ldconfig

# Build and install VC4CL
RUN cd /opencl/VC4CL && mkdir build && cd build && \
    cmake .. && make && sudo make install && sudo ldconfig

# Clone Darknet repo and set up for OpenCL
WORKDIR /
RUN git clone https://github.com/sowson/darknet && cd darknet
RUN cd darknet \
  && sed -i 's/NVIDIA=1/NVIDIA=0/' Makefile \
  && sed -i 's/AMD=0/AMD=1/' Makefile \
  && make -j 4

# Create final image
FROM debian:buster-slim as ml_api_base_rpi

# Copy darknet compiled in the first stage
COPY --from=darknet_builder_rpi /darknet /darknet

WORKDIR /
RUN apt update && apt install -y ca-certificates python3-pip wget
RUN pip3 install --upgrade pip
RUN pip3 install opencv_python_headless

# Install ONNX and ONNX Runtime with OpenCL support
RUN pip3 install onnx onnxruntime-opencl

# Install additional Python requirements
WORKDIR /app
ADD requirements.txt ./
RUN pip3 install -r requirements.txt

# Set library path for OpenCL
ENV LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/usr/local/lib"

# Expose API port
EXPOSE 3333

kennethjiang commented 10 months ago

Thank you! I doubt RPi's HW accelerator can handle this but it'll be a game-changer if it does! I'll love to see someone trying this script and checking if it works.

Dall127 commented 10 months ago

Update: I got my raspberry pi and could potentially start testing; however, it seems that there isn't a build for linux/arm/v7 in the thespaghettidetective/ml_api_base image. I suppose there are potentially other images that won't be built for the architecture as well. That will need to be the first step on getting the script working

kennethjiang commented 10 months ago

Update: I got my raspberry pi and could potentially start testing; however, it seems that there isn't a build for linux/arm/v7 in the thespaghettidetective/ml_api_base image. I suppose there are potentially other images that won't be built for the architecture as well. That will need to be the first step on getting the script working

The dockerfile to build arm64 version of the ml_api base image is https://github.com/TheSpaghettiDetective/obico-server/blob/release/ml_api/Dockerfile.base_arm64 . I guess arm64 is different from linux/arm/v7? I'm actually a bit confused here but I'm not against building one based on the arm64 version.

Dall127 commented 10 months ago

@kennethjiang yeah, arm64 is different than linux/arm/v7. ARM64 is for 64-bit processors, while linux/arm/v7 is for 32-bit ARM v7 architecture. Even though most raspberry pi's as of late have 64 bit processors, most version of rasbian run in 32 bit userland for comparability/legacy reasons, and usually need to be configured to run in ARM64 userland. Arm & docker simplified targeting the different arm revisions in the 64 bit revisions, while 32 bit you still need to specify which revision of arm you're building for. The default just flashing octoprint from the pi flasher is linux/arm/v7 though, for reference. If you're using Docker's buildx platform, I think you do something to this effect:

docker buildx create --use
docker buildx build --platform linux/arm64,linux/arm/v7,linux/amd64 -t your-image-name .

Edit: Here's the error

Building ml_api
Step 1/10 : FROM thespaghettidetective/ml_api_base:1.3
1.3: Pulling from thespaghettidetective/ml_api_base
ERROR: Service 'ml_api' failed to build: no matching manifest for linux/arm/v7 in the manifest list entries

kennethjiang commented 10 months ago

@kennethjiang yeah, arm64 is different than linux/arm/v7. ARM64 is for 64-bit processors, while linux/arm/v7 is for 32-bit ARM v7 architecture. Even though most raspberry pi's as of late have 64 bit processors, most version of rasbian run in 32 bit userland for comparability/legacy reasons, and usually need to be configured to run in ARM64 userland. Arm & docker simplified targeting the different arm revisions in the 64 bit revisions, while 32 bit you still need to specify which revision of arm you're building for. The default just flashing octoprint from the pi flasher is linux/arm/v7 though, for reference. If you're using Docker's buildx platform, I think you do something to this effect:
docker buildx create --use
docker buildx build --platform linux/arm64,linux/arm/v7,linux/amd64 -t your-image-name .
Edit: Here's the error
Building ml_api
Step 1/10 : FROM thespaghettidetective/ml_api_base:1.3
1.3: Pulling from thespaghettidetective/ml_api_base
ERROR: Service 'ml_api' failed to build: no matching manifest for linux/arm/v7 in the manifest list entries

This is because ml_api_base has never been built for linux/arm/v7. Can you try to build it to see if it works? The command for how ml_api_base is built is at https://github.com/TheSpaghettiDetective/obico-server/blob/release/ml_api/scripts/build_base_images.sh

Dall127 commented 10 months ago

@kennethjiang I was able to get that portion built (after much trial and tribulation haha) but now I'm stuck by this:

FROM thespaghettidetective/web:base-1.13

Unfortunately, I don't see any similar build scripts to modify to get it working for armv7; Do you have where I would build that image as well?

kennethjiang commented 10 months ago

@kennethjiang I was able to get that portion built (after much trial and tribulation haha) but now I'm stuck by this:
FROM thespaghettidetective/web:base-1.13
Unfortunately, I don't see any similar build scripts to modify to get it working for armv7; Do you have where I would build that image as well?

You mean this? https://github.com/TheSpaghettiDetective/obico-server/blob/release/scripts/build_dockerfile_web_base.sh

Dall127 commented 10 months ago

Yeah, I'm not sure why I couldn't find that. My latest issue, after now getting that image to build, is one you might have encountered before? This is I'm getting this error while running the image on the pi;

 Running command git clone --filter=blob:none --quiet https://github.com/TheSpaghettiDetective/daphne /tmp/pip-req-build-3pn9oxf6
  fatal: unable to access 'https://github.com/TheSpaghettiDetective/daphne/': server certificate verification failed. CAfile: none CRLfile: none
  error: subprocess-exited-with-error

  × git clone --filter=blob:none --quiet https://github.com/TheSpaghettiDetective/daphne /tmp/pip-req-build-3pn9oxf6 did not run successfully.
  │ exit code: 128
  ╰─> See above for output.

@kennethjiang, have you seen this one before? I get the impression that it's no longer a valid repo, but I'd rather not build it and find out it needed it (it takes a while)

Dall127 commented 10 months ago

@kennethjiang next update, I got everything building, things are (I believe) close to running. However, it seems there is an import error for line 189 in darknet.py

    load_net_custom = lib.load_network_custom
    load_net_custom.argtypes = [c_char_p, c_char_p, c_int, c_int]
    load_net_custom.restype = c_void_p

is an undefined symbol. From the naming scheme, I get the impression that this is not something that is standard in the libdarknet_gpu.so?

kennethjiang commented 10 months ago

@kennethjiang next update, I got everything building, things are (I believe) close to running. However, it seems there is an import error for line 189 in darknet.py
    load_net_custom = lib.load_network_custom
    load_net_custom.argtypes = [c_char_p, c_char_p, c_int, c_int]
    load_net_custom.restype = c_void_p
is an undefined symbol. From the naming scheme, I get the impression that this is not something that is standard in the libdarknet_gpu.so?

Which symbol is undefined? Can you post the error message?

Dall127 commented 10 months ago

@kennethjiang I ended up just removing the custom call and replacing it with the standard one that was lines before it, and that seemed to allow it to pass. However, as I've gotten closer to getting everything to compile, I think that the containerization for the ml_api might not work for the OpenCL implementation for the RPI after discussing with the other on it's feasibility; I keep running into CL_INVALID_CONTEXT after everything initializes. I'll make a pull request in the next couple of days that implements the improvements that I've made to the building process, as well as the progress that I've made, but I think that I've exhausted my free-time until this semester is over. If someone want's to carry on the torch or ask me questions, I'd be more than happy to answer, but I'll be taking a break for the time being.

kennethjiang commented 10 months ago

Ok. Thank you @Dall127 for working on it. I guess I do want the code to be fully-tested it before I can merge. And I do hope someone can take this over, or you can come back to it when you have more time.

YaphetS1 commented 9 months ago

Hi there! @Dall127 Do you push rpi image?

nvtkaszpir commented 6 months ago

hm this would rather require some steps:

prepare Dockerfile.base_armv7 probbaly from scratch
build above image
build app

From my tests of running arm64 in qemu docker runner (compliled image for jetson but without gpu, so it is cpu only) ends in processing 1600x1200 image in about 47s, while on intel 14th gen cpu (no gpu) it takes about 2s

I suspect it will be hardly usaeable unless image resolution will be way smaller.

ss8688 commented 5 months ago

there is a ai detector can work on raspberry pi https://github.com/willuhmjs/forgetti

nvtkaszpir commented 5 months ago

Interesting, I'd like to see model comparison because obico uses 10x bigger models in size in bytes (not sure how it correlates to other params...)

darthbanana13 commented 4 months ago

Would a Coral TPU offer enough acceleraton for a RPi to make processing the images faster?

nvtkaszpir commented 4 months ago

Afair Google Coral is not compatible with current Obico ML API framework (darknet/onnx) and it would require model conversion... Or rather doing it from scratch using TensorFlow and then converting to TensorFlow Lite

TheSpaghettiDetective / obico-server

Sample Integration of Raspberry Pi (Models 1-3) GPU Support via OpenCL #827