Closed Gsarg18 closed 3 years ago
@Gsarg18 Can you post the va serving log? You can increase the log level with an environment variable (LOG_LEVEL=DEBUG). The Open Visual Cloud docker files have different versions of the drivers than the default VA Serving container- so if VA Serving standalone is working and Open Visual Cloud base image is not - then I suspect a difference in dependencies.
I have attached the va seving log gpu_pipeline.log
@Gsarg18 As confirmation - you ran the same VA Serving image (using XeonE3 base image from openvisual cloud docker files) outside open visual cloud and it is working?
Or: Did you build a VA Serving image from the VA Serving git hub?
If it is the second - then highly suspect the dependencies in the base image -
Can you provide the build command / output you used to create the VA Serving image?
Please give the output of the following. No output means that GPU cannot be detected.
$ docker run -it --device /dev/dri --entrypoint /bin/bash openvisualcloud/xeone3-ubuntu1804-analytics-gst:20.10 -c "clinfo -l"
This is the output of above command:
Platform #0: Intel(R) OpenCL HD Graphics `-- Device #0: Intel(R) Gen9 HD Graphics NEO
Thanks for quick response
Sorry for late response @nnshah1 I run VA-serving by replacing openvisual xeon base image with xeone3(./docker/build.sh --base openvisualcloud/xeone3-ubuntu1804-analytics-gst ) image on GPU and it is working. Then i try to run the same Xeone3 image in smart city sample with latest VA-serving version, it is not working
@whbruce logs of smart city with GPU and VA-serving v0.3.1.1-alpha
I believe I understand what might be happening:
Docker swarm does not support the 'device' or 'priveledged mode'. To enable this in swarm you have to enable a special container image with docker runtime client support that can launch a container with privileges. This is how the vcac-a deployment scripts are set up. Within the analytics folder you can find the run-container.sh within the vcac-a subfolder.
This would explain why the same image run using the video analytics serving run scripts works as expected as those too use docker run directly -
TL/dr: you will need to create / run a container launcher within in swarm to access the igpu hardware -
@nnshah1, we are using kubernetes deployment not docker swarm. How to make these changes in kubernetes?
@Gsarg18 , For Kubernetes, I believe you can designate a pod as "priviledged". You should be able to deploy the analytics container as a privileged pod. https://kubernetes.io/docs/concepts/workloads/pods/#privileged-mode-for-containers
@xwu2git, In order to run the analytics container on VCAC-A (with access to GPU) within Kubernetes do we use privileged pods or do we use the same technique as in docker swarm (i.e. a container that launches another container?)
For gpus, you can either use a privileged pod or install the gpu device plugins.
@nnshah and @xwu2git Thanks for the suggestion related to making analytics pod as priviledged, we will try it and let you know. Another clarification is , VCAC and GPU are two different issues, here we are concerned about running smart city on GPU only. VCAC is on different thread: https://github.com/OpenVisualCloud/Smart-City-Sample/issues/741
Thanks
Thankyou @nnshah1 @xwu2git @whbruce We did the changes as suggested by you, and now smart-city-sample is working on GPU with kubernetes deployment
Thanks for the update! This is great news! Can you briefly describe the change in set up - so we can capture for anyone else running into the same issue?
I have tried using GPU(Intel® UHD Graphics 630 (CFL GT2)) and processor(Intel® Core™ i7-8700 CPU @ 3.20GHz × 12) with video-analytics-serving(change docker image from Xeon to Xeone3) and it is working, but when i tired to work with smart city sample on GPU it is not working. I also raised the issue here:- https://github.com/OpenVisualCloud/Smart-City-Sample/issues/736 I also tried by changing the VA-Serving version from 0.3.0-alpha to 0.3.1.1-alpha in the smart city sample still it is not working