dlstreamer / pipeline-server

Home of Intel(R) Deep Learning Streamer Pipeline Server (formerly Video Analytics Serving)
BSD 3-Clause "New" or "Revised" License
126 stars 51 forks source link

Intel GPUs is not working in openvisual cloud #56

Closed Gsarg18 closed 3 years ago

Gsarg18 commented 3 years ago

I have tried using GPU(Intel® UHD Graphics 630 (CFL GT2)) and processor(Intel® Core™ i7-8700 CPU @ 3.20GHz × 12) with video-analytics-serving(change docker image from Xeon to Xeone3) and it is working, but when i tired to work with smart city sample on GPU it is not working. I also raised the issue here:- https://github.com/OpenVisualCloud/Smart-City-Sample/issues/736 I also tried by changing the VA-Serving version from 0.3.0-alpha to 0.3.1.1-alpha in the smart city sample still it is not working

nnshah1 commented 3 years ago

@Gsarg18 Can you post the va serving log? You can increase the log level with an environment variable (LOG_LEVEL=DEBUG). The Open Visual Cloud docker files have different versions of the drivers than the default VA Serving container- so if VA Serving standalone is working and Open Visual Cloud base image is not - then I suspect a difference in dependencies.

Gsarg18 commented 3 years ago

I have attached the va seving log gpu_pipeline.log

nnshah1 commented 3 years ago

@Gsarg18 As confirmation - you ran the same VA Serving image (using XeonE3 base image from openvisual cloud docker files) outside open visual cloud and it is working?

Or: Did you build a VA Serving image from the VA Serving git hub?

If it is the second - then highly suspect the dependencies in the base image -

Can you provide the build command / output you used to create the VA Serving image?

whbruce commented 3 years ago

Please give the output of the following. No output means that GPU cannot be detected.

$ docker run -it --device /dev/dri  --entrypoint /bin/bash openvisualcloud/xeone3-ubuntu1804-analytics-gst:20.10 -c "clinfo -l"
Gsarg18 commented 3 years ago

This is the output of above command:

Platform #0: Intel(R) OpenCL HD Graphics `-- Device #0: Intel(R) Gen9 HD Graphics NEO

whbruce commented 3 years ago

Thanks for quick response

  1. The clinfo output shows that the container can access the GPU. This is good news!
  2. Your docker log does not show any errors, can you clarify what you mean by "not working".
  3. Note that GPU inference takes ~30s to respond to first request.
  4. Please answer @nnshah1's question, how did the build the VA Serving container
  5. Please update to the latest VA Serving version, v0.4.1.
Gsarg18 commented 3 years ago

Sorry for late response @nnshah1 I run VA-serving by replacing openvisual xeon base image with xeone3(./docker/build.sh --base openvisualcloud/xeone3-ubuntu1804-analytics-gst ) image on GPU and it is working. Then i try to run the same Xeone3 image in smart city sample with latest VA-serving version, it is not working

@whbruce logs of smart city with GPU and VA-serving v0.3.1.1-alpha GPU_error

nnshah1 commented 3 years ago

I believe I understand what might be happening:

Docker swarm does not support the 'device' or 'priveledged mode'. To enable this in swarm you have to enable a special container image with docker runtime client support that can launch a container with privileges. This is how the vcac-a deployment scripts are set up. Within the analytics folder you can find the run-container.sh within the vcac-a subfolder.

This would explain why the same image run using the video analytics serving run scripts works as expected as those too use docker run directly -

TL/dr: you will need to create / run a container launcher within in swarm to access the igpu hardware -

Gsarg18 commented 3 years ago

@nnshah1, we are using kubernetes deployment not docker swarm. How to make these changes in kubernetes?

nnshah1 commented 3 years ago

@Gsarg18 , For Kubernetes, I believe you can designate a pod as "priviledged". You should be able to deploy the analytics container as a privileged pod. https://kubernetes.io/docs/concepts/workloads/pods/#privileged-mode-for-containers

@xwu2git, In order to run the analytics container on VCAC-A (with access to GPU) within Kubernetes do we use privileged pods or do we use the same technique as in docker swarm (i.e. a container that launches another container?)

xwu2git commented 3 years ago

For gpus, you can either use a privileged pod or install the gpu device plugins.

Gsarg18 commented 3 years ago

@nnshah and @xwu2git Thanks for the suggestion related to making analytics pod as priviledged, we will try it and let you know. Another clarification is , VCAC and GPU are two different issues, here we are concerned about running smart city on GPU only. VCAC is on different thread: https://github.com/OpenVisualCloud/Smart-City-Sample/issues/741

Thanks

Gsarg18 commented 3 years ago

Thankyou @nnshah1 @xwu2git @whbruce We did the changes as suggested by you, and now smart-city-sample is working on GPU with kubernetes deployment

nnshah1 commented 3 years ago

Thanks for the update! This is great news! Can you briefly describe the change in set up - so we can capture for anyone else running into the same issue?