google-coral / libedgetpu

Source code for the userspace level runtime driver for Coral.ai devices.
Apache License 2.0
181 stars 62 forks source link

Installing PCIe driver and Edge TPU runtime inside a docker #12

Open arun-kumark opened 3 years ago

arun-kumark commented 3 years ago

Hello, I want to provide the EdgeTPU runtime and PCIe drivers inside dockers, and don't want to install on the machine directly. My objective is to release these two packages not with the main stream distribution but inside a dockerized solution ?

Would that be a correct approach? Thanks for the suggestions.

Kind Regards Arun

Namburger commented 3 years ago

Hi @arun-kumark yes it is possible. Here is an example docker image: https://gist.github.com/Namburger/5c18a894951cb4bf26d6033b029e4144

arun-kumark commented 3 years ago

Thanks a lot @Namburger . This helped a lot..

Kind Regards Arun

arun-kumark commented 3 years ago

Hi Namburger,

do you have any ideas/suggestions on multiple apps using coral

Kind regards Arun

Namburger commented 3 years ago

@arun-kumark I would create a server app that serves the rest of the app (either via socket or REST, that is up to you). This way, a single docker will have the tpu access while the rest depends on it. Maybe something like this: https://github.com/namburger/restor

Giving the docker image a path should be good enough, no need for full priviledge, that was only for demonstation!

arun-kumark commented 3 years ago

Thank you Num for sharing the example, also very beautiful article on Medium...

To give you more information on the setup I am testing, this is a small intel device installed inside the shopfloor integrated with Coral card and based on Windows 10 OS. This device has to work mostly in offline mode. The inference has to execute on the device, the HMI is attached with the device.

Thanks a lot again !

Namburger commented 3 years ago

@arun-kumark are all of the containers being ran on this same ps? The problem with allowing all of the containers to have access to the tpu is that the tpu may fails if 2 or more containers is trying to use it at the same time. Imagine container A doing an inference on an image, that image needs to be transferred to the tpu, if say container B is also sending a different image on to the tpu for inference, now the data on the tpu could be all messed up. So if you want all of the containers to have access to it, there must be some forms of communication between the containers to make sure that they know if the tpu is free or not.

arun-kumark commented 3 years ago

Thank you Num for explanation. I experienced this problem while running the Dockers in Ubuntu setup. Could you help me on the issue related to running the inference on Windows OS. I have placed the issue here on PyCoral : DLL_error

Thank you Kind Regards Arun

arun-kumark commented 3 years ago

Hi Namburger, I came across the scenario, could you guide me what could be the possible impact ?

Case: If PCIe driver is already installed on the machine, and Docker also running with PCIe inside it. How the device will behave in such case. My problem is that, someone can use the device using Docker, even driver is not installed on the machine.

Do you see any problem in this case?

If yes, can I restrict the user to use PCIe driver inside the docker?

Kind Regards Arun

arun-kumark commented 3 years ago

Dear Namburger, I am using the Dockers on a remote system, and unfortunately, I cannot install directly on the root. But I can use the Docker alone to test the Google Coral device. The Device is connected on the Mini PCIe using Adapter. After running my Docker, I get the following logs:

Traceback (most recent call last):
  File "/home/tflite/python/examples/classification/classify_image.py", line 122, in <module>
    main()
  File "/home/tflite/python/examples/classification/classify_image.py", line 99, in main
    interpreter = make_interpreter(args.model)
  File "/home/tflite/python/examples/classification/classify_image.py", line 73, in make_interpreter
    {'device': device[0]} if device else {})
  File "/usr/local/lib/python3.6/dist-packages/tflite_runtime/interpreter.py", line 164, in load_delegate
    library, str(e)))
ValueError: Failed to load delegate from libedgetpu.so.1

These logs are expected when the Device is not connected with the system. Unfortunately, I don;t have SSH or TTY to the device due to some limitations, I just have an option to test the device using Dockers.

My Docker file is as below:

Base image Ubuntu

FROM ubuntu:18.04

Time server

ENV TZ=America/Los_Angeles RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone

WORKDIR /home
ENV HOME /home
RUN cd ~
RUN apt-get update
RUN apt-get install -y git vim python3-pip python-dev pkg-config wget usbutils curl ca-certificates openssh-server
RUN echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" \
| tee /etc/apt/sources.list.d/coral-edgetpu.list
RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
RUN apt-get update
RUN apt-get install -y libedgetpu1-std 
RUN apt-get install -y python3-edgetpu 
RUN apt-get install -y python3-opencv
RUN wget https://dl.google.com/coral/python/tflite_runtime-2.1.0-cp36-cp36m-linux_x86_64.whl
RUN pip3 install tflite_runtime-2.1.0-cp36-cp36m-linux_x86_64.whl
RUN apt install -y python3-pycoral
RUN git clone https://github.com/google-coral/tflite.git
RUN bash /home/tflite/python/examples/classification/install_requirements.sh

CMD ["python3",  "/home/tflite/python/examples/classification/classify_image.py",   "--model",  "/home/tflite/python/examples/classification/models/mobilenet_v2_1.0_224_inat_bird_quant.tflite",   "--labels", "/home/tflite/python/examples/classification/models/inat_bird_labels.txt",   "--input", "/home/tflite/python/examples/classification/images/parrot.jpg"]

Could you help me, how can I test the device. I surely can provide more information on the system if you require.

Thanks a lot for your great help always !!

Kind Regards Arun

Namburger commented 3 years ago

@arun-kumark what's going on here is that your version of tflite_runtime(2.1.0-cp36-cp36m) is way out of date. What you want is probably this package:

RUN wget https://github.com/google-coral/pycoral/releases/download/v1.0.1/tflite_runtime-2.5.0-cp36-cp36m-linux_x86_64.whl
RUN pip3 install tflite_runtime-2.5.0-cp36-cp36m-linux_x86_64.whl
Namburger commented 3 years ago

FYI, it can also be found here: https://github.com/google-coral/pycoral/releases

arun-kumark commented 3 years ago

@Namburger I tried updating my Docker file with the suggested releases. But the error is persistent.

There is nothing installed on the machine, and we want to test only Attached device for inference.

The logs are here.

Traceback (most recent call last): File "/home/tflite/python/examples/classification/classify_image.py", line 122, in main() File "/home/tflite/python/examples/classification/classify_image.py", line 99, in main interpreter = make_interpreter(args.model) File "/home/tflite/python/examples/classification/classify_image.py", line 73, in make_interpreter {'device': device[0]} if device else {}) File "/usr/local/lib/python3.6/dist-packages/tflite_runtime/interpreter.py", line 164, in load_delegate library, str(e))) ValueError: Failed to load delegate from libedgetpu.so.1

arun-kumark commented 3 years ago

Hi @Namburger

It looks to me that, the below lines of the code: C)

**```

Install Coral Driver

RUN apt install gasket-dkms libedgetpu1-std -y && apt-get clean RUN touch /etc/udev/rules.d/65-apex.rules RUN sh -c "echo 'SUBSYSTEM==\"apex\", MODE=\"0660\", GROUP=\"apex\"' >> /etc/udev/rules.d/65-apex.rules" RUN groupadd apex



Which I am trying to place in my new Docker file wont work, as they must be installed on the Device?
As I don't have the access to install anything on the device directly, but eveything inside the docker only, do you think this could be the problem that my Google Coral is not working as intended?

Thank u

Kind Regards
Arun