Closed jjimin closed 4 years ago
@jjimin
A few normal diagnostic for ValueError: Failed to load delegate from libedgetpu.so.1
:
1) Could you check if libedgetpu is properly installed with
$ ll /usr/lib/{GNU-TYPE}/libedge*
?
For reference:
$ ll /usr/lib/x86_64-linux-gnu/libedgetpu*
lrwxrwxrwx 1 root root 43 Oct 17 15:03 /usr/lib/x86_64-linux-gnu/libedgetpu.so.1 -> /usr/lib/x86_64-linux-gnu/libedgetpu.so.1.0
-rwxr-xr-x 1 root root 930K Oct 17 15:03 /usr/lib/x86_64-linux-gnu/libedgetpu.so.1.0
2) We normally get this error if the linux user in the system isn't in the plugdev
group, which could be an easy fix with either running in sudo or adding the user into the plugdev group. However since you're in a docker container, you should already running as root, so maybe this case should be eliminated for now.
3) I know this is silly, but is your accelerator plugged in?
4) Lastly, have you been able to run this demo on your host machine without using a docker container? I know that we've been seeing some hiccups with Ubuntu 16.04 with the tensorflow api(see here, but this is a different error messages). Can you also try the edgetpu api for doing inferencing?
@jjimin UPDATE: I ran into your exact issue, and was able to fix with slight modification to the Dockerfile (for dependencies issue). It appears that the docker container just didn't have access to usb devices. Here are some info information:
Ubuntu 18.04.3 LTS
Docker CE 19.03.4 with same exact image
Dockerfile:
FROM tensorflow/tensorflow:nightly-devel-gpu-py3
WORKDIR /home ENV HOME /home VOLUME /data EXPOSE 8888 RUN cd ~ RUN add-apt-repository -y ppa:ubuntu-toolchain-r/test RUN apt-get update RUN apt-get install -y git nano python-pip python-dev pkg-config wget usbutils gcc-4.9 RUN apt-get upgrade -y libstdc++6
RUN echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" \ | tee /etc/apt/sources.list.d/coral-edgetpu.list RUN curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - RUN apt-get update RUN apt-get install -y libedgetpu1-std
RUN wget https://dl.google.com/coral/python/tflite_runtime-1.14.0-cp35-cp35m-linux_x86_64.whl RUN pip3 install tflite_runtime-1.14.0-cp35-cp35m-linux_x86_64.whl
RUN mkdir coral && cd coral RUN git clone https://github.com/google-coral/tflite.git
- **Build:**
$ docker build -t "coral-usb:0.1" .
- **Run image (the `--privileged` flag was probably what you were missing):**
$ docker run -it --privileged -v /dev/bus/usb:/dev/bus/usb coral-usb:0.1 /bin/bash
- **Example demo run:**
root@c495f381807a:~/tflite# cd ~/tflite/python/examples/classification/ root@c495f381807a:~/tflite/python/examples/classification# ls README.md classify.py classify_image.py install_requirements.sh root@c495f381807a:~/tflite/python/examples/classification# bash install_requirements.sh Requirement already satisfied: numpy in /usr/local/lib/python3.5/dist-packages (1.15.4) Requirement already satisfied: Pillow in /usr/local/lib/python3.5/dist-packages (5.3.0) You are using pip version 18.1, however version 19.3.1 is available. You should consider upgrading via the 'pip install --upgrade pip' command. % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 189 100 189 0 0 236 0 --:--:-- --:--:-- --:--:-- 236 100 3988k 100 3988k 0 0 2085k 0 0:00:01 0:00:01 --:--:-- 4692k 100 181 100 181 0 0 540 0 --:--:-- --:--:-- --:--:-- 176k 100 3448k 100 3448k 0 0 3624k 0 --:--:-- --:--:-- --:--:-- 3624k 100 158 100 158 0 0 826 0 --:--:-- --:--:-- --:--:-- 154k 100 40895 100 40895 0 0 117k 0 --:--:-- --:--:-- --:--:-- 117k % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 148 100 148 0 0 201 0 --:--:-- --:--:-- --:--:-- 201 100 3068k 100 3068k 0 0 1796k 0 0:00:01 0:00:01 --:--:-- 3368k root@c495f381807a:~/tflite/python/examples/classification# python3 classify_image.py \
--model models/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite \ --labels models/inat_bird_labels.txt \ --input images/parrot.jpg INFO: Initialized TensorFlow Lite runtime. ----INFERENCE TIME---- Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory. 12.6ms 4.0ms 4.1ms 4.0ms 4.0ms -------RESULTS-------- Ara macao (Scarlet Macaw): 0.76172
Hope this helps!
P.S. Unrelated, but out of curiosity, any reasons for using the gnu tensorflow image?
@Namburger how did you verify that the container didn't have access to usb devices? I'm currently testing it in a container (on rpi4) in which I can see the device being recognized (I see the coral USB with lsusb
) but I get the same error ValueError: Failed to load delegate from libedgetpu.so.1
.
I'll make an independent issue if I can't figure it out in the next couple of days.
@jjimin
As I mentioned, I ran to the same exact issue, and was able to fix with the --privileged
flag, this give me an idea that maybe docker did not have access to usb devices. I don't think you should make an independent issue since it would make it harder to reference. You see, running our edgetpu library in virtualized containers are actually not officially supported, I was giving some pointers since it's working for me. Have you tried to run it under this command?
$ docker run -it --privileged -v /dev/bus/usb:/dev/bus/usb coral-usb:0.1 /bin/bash
Hi @Namburger , thanks for your quick reply.
I'll be debugging it later this evening (CET), so I'll be able to provide more info. I'm using docker compose to build the container and I'm mounting the volume /dev/bus/usb
as well as running it with --privileged
.
Thanks again for your quick response :) I'll update info and/or fix if I found it later today.
@Namburger thanks for your input. I could run on the TPU inside a container by using the API instead of using tflite_runtime
and it works without a problem, as you stated in https://github.com/google-coral/tflite/issues/2#issuecomment-545513883 .
@jjrugui hi, make sure you are using an updated version of the tflite_runtime library also, this will mostlikely solve your tflite runtime API issue. The new package should now be this:
https://dl.google.com/coral/python/tflite_runtime-2.1.0-cp35-cp35m-linux_x86_64.whl
I'm way too late to the party, but here is what I have discovered regarding this ValueError
issue in a Docker environment (note that I've used Raspberry Pi 4 8GB in my experiments):
--privileged
flag when running the container.
--restart always
flag to automatically re-run the script (assuming you've set the CMD
/ENTRYPOINT
directive properly), and all is well.
Environment:
libedgetpu1-max=16.0
python3-pycoral=2.0.0
python3-tflite-runtime=2.5.0.post1
Hopefully this may help someone in the future! 😄
I'm way too late to the party, but here is what I have discovered regarding this
ValueError
issue in a Docker environment (note that I've used Raspberry Pi 4 8GB in my experiments):
You need to add the
--privileged
flag when running the container.
- This is surely not the best practice, as you should prefer a more fine-grained capability control over granting the blanket root privilege to the container, but I haven't tested which capability is necessary to fix the error...
- You need to run the inference script as root inside the container.
The first inference attempt after the system boot always fails. Add the
--restart always
flag to automatically re-run the script (assuming you've set theCMD
/ENTRYPOINT
directive properly), and all is well.
- This issue only affects Docker environments; outside of the container, the same script works fine right after the boot.
Environment:
libedgetpu1-max=16.0 python3-pycoral=2.0.0 python3-tflite-runtime=2.5.0.post1
Hopefully this may help someone in the future! 😄
can you explain (better with example) how to use --restart always flag? I had the problem that Coral USB fails to lead delegate from libedgetpu.so.1 after the Raspberry Pi reboot. But compose down and compose up the container again, then it works.
@hvn2 Thanks for reaching out! Sorry that I forgot to mention that --restart always
is a Docker CLI flag (doc) (docker run --restart always mycontainer mycmd
). Other useful options are also mentioned in the linked page.
If you're using Docker Compose, you can set an equivalent option restart: always
(doc) to the service that runs your Python script. For example:
services:
ai:
# ...
# Restart the docker on failure, and after the system boot
restart: always
# Mount Coral TPU
devices:
- "/dev/bus/usb:/dev/bus/usb"
# Needs privileged access to the host OS
privileged: true
then run docker compose up
to start the service. The first time it'll throw an error, but Docker would immediately restart the container and re-attempt to initialize the TPU, and this time you won't see ValueError
.
We've recently open-sourced a project that utilizes Coral TPU & Raspberry Pi & Docker, which you might be interested in as a reference implementation:
compose.yml
: https://github.com/ai-pest/ai_pumpkin/blob/main/docker-compose.yml#L12Dockerfile
: https://github.com/ai-pest/ai_pumpkin/blob/main/ai/build/Dockerfile(Comments are written in Japanese; you can use machine-translation if necessary!)
Cheers!
System information
tensorflow/tensorflow:nightly-devel-gpu-py3
)I am trying to get started with my USB Accelerator using the
classify_image.py
source code in a Docker container. My Dockerfile for this project is like this:And I made the container with this command:
docker run -it -v /dev/bus/usb:/dev/bus/usb --gpus all coral-usb:0.1 /bin/bash
In the container, I followed the manual in 'Get started with the USB Accelerator'.
And after running the code above, I got some errors like this:
How could I solve this problem?