Closed divdaisymuffin closed 2 years ago
@divdaisymuffin can you share details on the hardware you are using?
Is it correct that with one pod you get 20FPS, but as you scale to multiple pods, it drops to 10 / 12 fps?
@nnshah1 Please find the hardware specs for the machines we are using
And we are observing it with single pod as well.
@divdaisymuffin Would you be able test the output of
gst-inspect-1.0 | grep vaapi
On your targets using: https://hub.docker.com/r/openvino/ubuntu20_data_runtime
That will help us determine if hw accelerated decode / encode can be an option or not.
@nnshah1 This is what we are getting:
@nnshah1 want to share one observation, you were right that the processed fps is affected by CPU usage, when I am running 4 pods on i7 cpu and the model is yolov3 I am getting processed fps of 1fps, 2 fps, 5 fps like that. And the camera was streaming at 25 fps each. And as soon as I removed 2 pods, my fps got increased to 12 fps. The cpu usage with 4 pods was 89.83%. Similarly when I run an heavy model with single pod also and cpu goes to 80% I have observed 6 fps processing fps with 25 fps as received fps. I will share complete table view also soon.
@nnshah1 As suggested by you, I have tried running on GPU, Please find the files: Dockerfile `# smtc_analytics_common_xeon_gst
FROM centos:7 as build
ARG VA_SERVING_REPO=https://raw.githubusercontent.com/intel/video-analytics-serving ARG VA_SERVING_TAG="v0.3.0-alpha"
RUN mkdir -p /home/vaserving/common/utils && touch /home/vaserving/init.py /home/vaserving/common/init.py /home/vaserving/common/utils/init.py && for x in common/utils/logging.py common/settings.py arguments.py ffmpeg_pipeline.py gstreamer_pipeline.py model_manager.py pipeline.py pipeline_manager.py schema.py vaserving.py; do curl -sSf -o /home/vaserving/$x -L ${VA_SERVING_REPO}/${VA_SERVING_TAG}/vaserving/$x; done COPY *.py /home/
FROM openvisualcloud/xeone3-ubuntu1804-analytics-gst:20.10
RUN apt-get update -qq && apt-get install -qq python3-gst-1.0 python3-jsonschema python3-psutil && rm -rf /var/lib/apt/lists/*
COPY --from=build /home/ /home/ ENV FRAMEWORK=gstreamer ENV PYTHONIOENCODING=UTF-8
ARG USER=docker ARG GROUP=docker ARG UID ARG GID
RUN [ ${GID} -gt 0 ] && groupadd -f -g ${GID} ${GROUP}; \ [ ${UID} -gt 0 ] && useradd -d /home -M -g ${GID} -K UID_MAX=${UID} -K UID_MIN=${UID} ${USER}; \ chown -R ${UID}:${GID} /home
`
Pipeline.json
{ "name": "object_detection", "version": 2, "type": "GStreamer", "template":"rtspsrc udp-buffer-size=212992 name=source ! queue ! rtph264depay ! h264parse ! video/x-h264 ! tee name=t ! queue ! decodebin ! videoconvert name=\"videoconvert\" ! video/x-raw(memory:VASurface) ! vaapipostproc brightness=0.0001 ! queue leaky=upstream ! gvadetect device=GPU pre-process-backend=vaapi model=\"{models[face_detection_adas][1][network]}\" model-proc=\"{models[face_detection_adas][1][proc]}\" name=\"detection\" threshold=0.10 ! gvaclassify model=\"{models[age-gender-recognition-retail-0013][1][network]}\" model-proc=\"{models[age-gender-recognition-retail-0013][1][proc]}\" name=\"recognition\" model-instance-id=recognition ! gvametaconvert name=\"metaconvert\" ! queue ! gvapython name=\"QueueCounting\" module=\"custom_transforms/final_count.py\" class=\"QueueCounting\" ! gvametapublish name=\"destination\" ! appsink name=appsink t. ! splitmuxsink max-size-time=60000000000 name=\"splitmuxsink\"", "description": "Object Detection Pipeline", "parameters": { "type" : "object", "properties" : { "inference-interval": { "element":"detection", "type": "integer", "minimum": 0, "maximum": 4294967295 }, "cpu-throughput-streams": { "element":"detection", "type": "string" }, "n-threads": { "element":"videoconvert", "type": "integer" }, "nireq": { "element":"detection", "type": "integer", "minimum": 1, "maximum": 64 }, "device": { "element": "detection", "default": "GPU", "type": "string" }, "recording_prefix": { "type":"string", "default":"recording" } } } }
Yaml `apiVersion: apps/v1 kind: Deployment metadata: name: traffic-office1-analytics-traffic labels: app: traffic-office1-analytics-traffic spec: replicas: 1 selector: matchLabels: app: traffic-office1-analytics-traffic template: metadata: labels: app: traffic-office1-analytics-traffic spec: enableServiceLinks: false hostNetwork: true dnsPolicy: ClusterFirstWithHostNet containers:
PROBLEM: analytics.yaml.m4 takes NETWORK_PREFERENCE I think from build.sh, so when I do "cmake" and then "make" again the analytics.yaml again comes with "CPU".
I am also sharing analytics logs where you can see "NETWORK_PREFERENCE==CPU", and also it says "vaapipostproc" no element.
Please suggest.
@xwu2git @nnshah1 I have seen one more thing, which is related to this issue https://github.com/OpenVisualCloud/Dockerfiles/issues/662
That the xeone3-ubuntu1804-analytics-gst:20.10
does not suport comet lake GPU.
grep "model name" /proc/cpuinfo | head -1 model name : Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
docker run -it --device /dev/dri --entrypoint /bin/bash openvisualcloud/xeone3-ubuntu1804-analytics-gst:20.10 -c "clinfo -l"
It does not give any output, but when I ran on another machine which is
grep "model name" /proc/cpuinfo | head -1 model name : Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
docker run -it --device /dev/dri --entrypoint /bin/bash openvisualcloud/xeone3-ubuntu1804-analytics-gst:20.10 -c "clinfo -l" Platform #0: Intel(R) OpenCL HD Graphics
-- Device #0: Intel(R) Gen9 HD Graphics NEO`
It gives this output, which shows that on comet lake it is not working.
So, Then I shifted to Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
But still no element vaapipostproc
error is still there.
I have also set priviledged=True in analytics.yaml.m4 as you suggested but it didn't worked out
Please help to resolve.
To support comet lake via opencl you will need an updated driver. I've created a branch here to show the necessary modifications to the Dockerfile:
https://github.com/nnshah1/Smart-City-Sample/commit/1df8ef5de3da69970048063a54fc96a8c441add4
for vaapipostproc - is this issue seen when running the container as above interactively - or only when running via kubernetes?
On a comet lake system I recommend the following to achieve 4 streams at 30fps:
key things: use FP16-INT8 model (for person-detection-retail-0013), use CPU_THREADS_NUM and cpu-throughtput-streams to limit number of threads being used per process. Use vaapi for decode and use MULTI:CPU,GPU for device.
gst-launch-1.0 filesrc location=/home/pipeline-zoo/workspace/od-h264-people/workloads/2m.264/disk/input/stream.h264 ! video/x-h264 ! h264parse ! video/x-h264 ! vaapih264dec name=decode0 ! vaapipostproc ! video/x-raw(memory:VASurface) ! gvadetect device=MULTI:CPU,GPU pre-process-backend=vaapi gpu-throughput-streams=1 cpu-throughput-streams=1 ie-config=CPU_THREADS_NUM=1,CPU_BIND_THREAD=NO nireq=4 name=detect0 model=/home/pipeline-zoo/workspace/od-h264-people/models/person-detection-retail-0013/FP16-INT8/person-detection-retail-0013.xml ! gvametaconvert add-empty-results=true ! gvametapublish method=file file-format=json-lines file-path=/tmp/216db4ac-52b1-11ec-a9a6-1c697aa336da/output ! gvafpscounter ! fakesink async=false name=sink0
@divdaisymuffin Please check changes here required to enable GPU. Below steps to verify it works.
Pull and build from Neelay's fork. https://github.com/nnshah1/Smart-City-Sample/tree/updated_opencl_driver
, remember to checkout to branch updated_opencl_driver
git clone https://github.com/nnshah1/Smart-City-Sample.git
cd Smart-City-Sample
checkout updated_opencl_driver
mkdir build
cd build
cmake ..
make
make start_kubernetes
analytics-traffic
containerFind and exec into analytics-traffic container, replace k8s_traffic-office1-analytics-traffic_traffic-office1-analytics-traffic-6f9497b9c4-g52lk_default_5e9009d2-486a-4114-8b54-745c31395c78_0 with your container name
sudo docker exec -it k8s_traffic-office1-analytics-traffic_traffic-office1-analytics-traffic-6f9497b9c4-g52lk_default_5e9009d2-486a-4114-8b54-745c31395c78_0 /bin/bash
vaapipostproc
gst-inspect-1.0 vaapipostproc
vaapipostproc
and device=MULTI:CPU,GPU
by running pipeline below inside containergst-launch-1.0 urisourcebin \
uri=https://github.com/intel-iot-devkit/sample-videos/blob/master/person-bicycle-car-detection.mp4?raw=true \
! decodebin ! vaapipostproc ! "video/x-raw(memory:VASurface)" \
! gvadetect device=MULTI:CPU,GPU pre-process-backend=vaapi gpu-throughput-streams=1 \
cpu-throughput-streams=1 ie-config=CPU_THREADS_NUM=1,CPU_BIND_THREAD=NO nireq=4 name=detect0 \
model=/home/models/person_detection_2020R2/1/FP16/person-detection-retail-0013.xml \
model-proc=/home/models/person_detection_2020R2/1/person-detection-retail-0013.json \
! gvametaconvert add-empty-results=true ! gvametapublish method=file file-format=json-lines file-path=/tmp/output \
! gvafpscounter ! fakesink async=false name=sink0
@tthakkal @nnshah1 Thanks for the support, we are able to run above given pipeline inside docker or kubernetes pod container and it is utilising GPU successfully. But still when we are trying to run pipeline defined inside Xeon/gst/pipeline/2/pipeline.json, that fails to run with certain syntax errors or element unsupported.
I am sharing my pipeline with you, Please help me in correcting it and running it in Smart-City-Sample.
{ "name": "ppl-density-det", "version": 2, "type": "GStreamer", "template":"rtspsrc udp-buffer-size=212992 name=source ! queue ! rtph264depay ! h264parse ! video/x-h264 ! tee name=t ! queue ! decodebin ! videoconvert name=\"videoconvert\" ! vaapipostproc ! video/x-raw(memory:VASurface) ! queue leaky=upstream ! gvadetect device=MULTI:CPU,GPU pre-process-backend=vaapi gpu-throughput-streams=1 cpu-throughput-streams=1 ie-config=CPU_THREADS_NUM=1,CPU_BIND_THREAD=NO model=\"{models[head_yolov4_tiny_608to416_default_anchors_mask_012_heatmap_INT8][1][network]}\" model-proc=\"{models[head_yolov4_tiny_608to416_default_anchors_mask_012_heatmap_INT8][1][proc]}\" name=\"detection\" threshold=0.40 ! gvametaconvert name=\"metaconvert\" ! queue ! gvametapublish name=\"destination\" ! gvafpscounter ! appsink name=appsink t. ! splitmuxsink max-size-time=300000000000 name=\"splitmuxsink\"", "description": "ppl-density-det Pipeline", "parameters": { "type" : "object", "properties" : { "inference-interval": { "element":"detection", "type": "integer", "minimum": 0, "maximum": 4294967295 }, "cpu-throughput-streams": { "element":"detection", "type": "string" }, "n-threads": { "element":"videoconvert", "type": "integer" }, "nireq": { "element":"detection", "type": "integer", "minimum": 1, "maximum": 64 }, "recording_prefix": { "type":"string", "default":"recording" } } } }
Please find attached logs as well.
@divdaisymuffin
Replace model=\"{models[head_yolov4_tiny_608to416_default_anchors_mask_012_heatmap_INT8][1][network]}\"
with model=\"{models[head_yolov4_tiny_608to416_default_anchors_mask_012_heatmap_INT8][1][FP16][network]}\"
assuming you have model in FP16
directory. I have tested with FP16 precision, if you want to try with different precision, you can change that to INT8
or FP32
and see that works and/or betters performance.
@tthakkal @nnshah1 Yes, finally it is working and utilizing GPU as well, but to our surprise, it is using CPU as well.
The above image is been seen using
kubectl apply -f https://raw.githubusercontent.com/pythianarora/total-practice/master/sample-kubernetes-code/metrics-server.yaml
kubectl top po
Cant we restrict the use of CPU?
CPU_THREADS_NUM=1
and cpu-throughput-streams=1
should restrict use of CPU for gvadetect
, CPU usage you are seeing might be from other elements or processes. You can verify it by removing those values and see if CPU go up from the current number.
I see decodebin
and videoconvert
aren't really needed in your pipeline. You can remove this ! decodebin ! videoconvert name=\"videoconvert\"
, that should help in reducing some usage.
By the way, do you see 4 streams at better fps now?
@tthakkal @nnshah1 yes, the CPU_THREADS_NUM=1
and removal of videoconvert
helped to reduce the CPU utilization, I tried with without GPU there I am seeing improvement, although we are not able to remove decodebin
it gives error, but I need to understand why you said that "decodebin
and videoconvert
aren't really needed in your pipeline"
And yes now we are getting better FPS with these suggestions even without using GPU.
@divdaisymuffin I am sorry, it's my mistake decodebin
or avdec_h264
is required for decoding. videoconvert
isn't required because vaapipostproc
is doing required conversion to video/x-raw(memory:VASurface)
@tthakkal what if we remove videoconvert
from the CPU pipeline as well, because we have done this and the CPU utilization decreased by 50%. Although a little decrease in detection accuracy of model observed but only 1 to 2 %.
I am sharing my CPU pipeline without videoconvert
and addition of CPU_THREADS_NUM=1
let me know if that is not suggested.
"template":"rtspsrc udp-buffer-size=212992 name=source ! queue ! rtph264depay ! h264parse ! video/x-h264 ! tee name=t ! queue ! decodebin ! queue leaky=upstream ! gvadetect model=\"{models[head_yolov4_tiny_608to416_default_anchors_mask_012_heatmap_INT8][1][network]}\" model-proc=\"{models[head_yolov4_tiny_608to416_default_anchors_mask_012_heatmap_INT8][1][proc]}\" name=\"detection\" ie-config=CPU_THREADS_NUM=1 threshold=0.40 ! gvametaconvert name=\"metaconvert\" ! queue ! gvapython name=\"new_wait\" module=\"custom_transforms/new_wait\" class=\"WaitTime\" ! gvametapublish name=\"destination\" ! appsink name=appsink t. ! queue ! splitmuxsink max-size-time=300000000000 name=\"splitmuxsink\"",
@divdaisymuffin that should work without any issues. videoconvert
is only required where decodebin
isn't providing caps format that gvadetect supports.
@tthakkal Thanks for the clarity.
@divdaisymuffin Please confirm the pipelines are now getting the correct density and utilization on your target hardware. If so we'll close this issue and can open others as needed.
@nnshah1 yes its working good we can close this
Hi @nnshah1 and @xwu2git ,
We found an observation, where the live camera is streaming at 20 FPS but when we check processed FPS its drops to 10 to 12 FPS. It is mostly observed when we are running 4 pods in the machine. So, Is it a known behavior? How we can improve the processed FPS. Note: 1 sensor is running for 1 analytics pod. We are observing it with almost all the pipelines, but as a sample I am sharing one.
{ "name": "object_detection", "version": 2, "type": "GStreamer", "template":"rtspsrc udp-buffer-size=212992 name=source ! queue ! rtph264depay ! h264parse ! video/x-h264 ! tee name=t ! queue ! decodebin ! videoconvert name=\"videoconvert\" ! video/x-raw,format=BGRx ! queue leaky=upstream ! gvadetect ie-config=CPU_BIND_THREAD=NO model=\"{models[person-detection-retail-0013][1][network]}\" model-proc=\"{models[person-detection-retail-0013][1][proc]}\" name=\"detection\" threshold=0.40 ! gvametaconvert name=\"metaconvert\" ! queue ! gvapython name=\"StaffEngagement\" module=\"custom_transforms/staff_engagement2\" class=\"StaffEngagement\" ! gvametapublish name=\"destination\" ! tee name = tt ! queue ! gvawatermark ! videoconvert ! jpegenc ! gvapython name=\"capture\" module=\"custom_transforms/staff_engagement2\" class=\"Capture\" ! queue ! appsink name=appsink t. ! queue ! splitmuxsink max-size-time=900000000000 name=\"splitmuxsink\"", "description": "Object Detection Pipeline", "parameters": { "type" : "object", "properties" : { "inference-interval": { "element":"detection", "type": "integer", "minimum": 0, "maximum": 4294967295 }, "cpu-throughput-streams": { "element":"detection", "type": "string" }, "n-threads": { "element":"videoconvert", "type": "integer" }, "nireq": { "element":"detection", "type": "integer", "minimum": 1, "maximum": 64 }, "recording_prefix": { "type":"string", "default":"recording" } } } }
Thanks