google-coral / edgetpu

Coral issue tracker (and legacy Edge TPU API source)
https://coral.ai
Apache License 2.0
429 stars 125 forks source link

Mini PCIe Accelerator: Assign TPUs to different docker containers #454

Open ykhorzon opened 3 years ago

ykhorzon commented 3 years ago

Description

Is there any way to assign(mount) apex_0 to container_0 then apex_1 to container_1? I tried docker run --device and --cap-add flag but nothing work.

#container_0
docker run --rm -it --cap-add=ALL --device=/dev/apex_0:/dev/apex_0  -p 8000:8000 pose-estimation:dev bash
# then run pose-estimation source code

It will give run time error. It seems to something wrong in driver or device mounting.

  File "/opt/ml/code/./service.py", line 14, in <module>
    interpreter = create_interpreter()
  File "/opt/ml/code/./posenet_inference_dist/engine.py", line 42, in create_interpreter
    interpreter = PoseEngine("models/posenet_resnet_50_416_288_16_quant_edgetpu_decoder.tflite", "pci:{}".format(EDGETPU_INDEX))
  File "/opt/ml/code/./pose_engine.py", line 82, in __init__
    edgetpu_delegate = load_delegate(EDGETPU_SHARED_LIB, {"device": device})
  File "/usr/local/lib/python3.8/dist-packages/tflite_runtime/interpreter.py", line 162, in load_delegate
    raise ValueError('Failed to load delegate from {}\n{}'.format(
ValueError: Failed to load delegate from libedgetpu.so.1

But I checked /dev/apex and it look fine.

root@07c4094bcae8:~# ls -al /dev/ | grep apex
crw-rw---- 1 root 1001 120, 0 Aug 29 03:47 apex_0

If using --privileged, all the apex_* will mount in container. docker run --rm -it --privileged pose-estimation:dev bash When I run another container with same command, it will occur coral TPU device is busy.

Click to expand! ### Issue Type Support, Others ### Operating System Ubuntu ### Coral Device Mini PCIe ### Other Devices _No response_ ### Programming Language Python 3.8 ### Relevant Log Output _No response_
ykhorzon commented 3 years ago

For more detail about my environment.

I already confirmed that my TPUs can work in host and container. I had 4 PCIe coral TPUs on my x86 ubuntu System. The TPUs can work properly in host system. and also work perfectly in single container.

docker run --rm -it --privileged  pose-estimation:dev bash
root@aed003d7e701:~# ls -al /dev/ | grep apex
crw-rw----  1 root    1001 120,     0 Aug 29 03:39 apex_0
crw-rw----  1 root    1001 120,     1 Aug 29 03:39 apex_1
crw-rw----  1 root    1001 120,     2 Aug 29 03:39 apex_2
crw-rw----  1 root    1001 120,     3 Aug 29 03:39 apex_3
# then run pose-estimation source code successfully...