OpenVisualCloud / Smart-City-Sample

The smart city reference pipeline shows how to integrate various media building blocks, with analytics powered by the OpenVINO™ Toolkit, for traffic or stadium sensing, analytics and management tasks.
BSD 3-Clause "New" or "Revised" License
186 stars 80 forks source link

Larger recording creation of greater than 10 minutes fails with mqtt disconnect error #797

Closed divdaisymuffin closed 2 years ago

divdaisymuffin commented 2 years ago

Hi @nnshah1 and @xwu2git

When I am providing a record time of 15 mins or larger it always fails with mqtt disconnect error.

I am attaching pipeline and log of the error on the same.

Please have a look

{ "name": "object_detection", "version": 2, "type": "GStreamer", "template":"rtspsrc udp-buffer-size=212992 name=source ! queue ! rtph264depay ! h264parse ! video/x-h264 ! tee name=t ! queue ! decodebin ! videoconvert name=\"videoconvert\" ! gvadetect ie-config=CPU_BIND_THREAD=NO model=\"{models[person_detection_2020R2][1][network]}\" model-proc=\"{models[person_detection_2020R2][1][proc]}\" name=\"detection1\" threshold=0.50 ! gvadetect ie-config=CPU_BIND_THREAD=NO model=\"{models[face_detection_adas][1][network]}\" model-proc=\"{models[face_detection_adas][1][proc]}\" name=\"detection\" threshold=0.50 ! gvametaconvert name=\"metaconvert\" ! queue ! gvametapublish name=\"destination\" ! appsink name=appsink t. ! splitmuxsink max-size-time=1200000000000 name=\"splitmuxsink\"", "description": "Object Detection Pipeline", "parameters": { "type" : "object", "properties" : { "inference-interval": { "element":"detection", "type": "integer", "minimum": 0, "maximum": 4294967295 }, "cpu-throughput-streams": { "element":"detection", "type": "string" }, "n-threads": { "element":"videoconvert", "type": "integer" }, "nireq": { "element":"detection", "type": "integer", "minimum": 1, "maximum": 64 }, "recording_prefix": { "type":"string", "default":"recording" } } } }

pipelineissue

nnshah1 commented 2 years ago

@divdaisymuffin the log indicates that the pipeline has been aborted, which indicates that it has been stopped explicitly - do you know how the stop is being triggered?

https://github.com/OpenVisualCloud/Smart-City-Sample/blob/b774b2b494a5ea8bb79b4f3b64ac1af433e06934/analytics/common/runva.py#L120

divdaisymuffin commented 2 years ago

No, that is not known, it never happens if I keep "max-size-time" less than equals to 10 mins, but whenever I increase it it happens.

nnshah1 commented 2 years ago

Based on the trace I believe something in the system is sending a 'stop' / 'kill' signal to the container.

Is the storage space of the container big enough to store the files temporarily? Also the rec2db will post and upload the clip - possible the volume for the local clips is running out of space?

I believe this is where the stop would originate from:

https://github.com/OpenVisualCloud/Smart-City-Sample/blob/b774b2b494a5ea8bb79b4f3b64ac1af433e06934/analytics/object/detect-object.py#L37

whbruce commented 2 years ago

I set recoding length to 12 minutes and ran for 8 hours with no problems using the built in RSTP simulator. Like @nnshah1 I believe something is stropping the pipeline, maybe due to lack of storage resource?

I added some instrumentation to the video segment saving code to extract some data - see table below. Timestamp Duration is calculated from timestamp in subsequent filenames, the rest is taken from the video segment media. I also updated to VA Serving v0.6.1 but this had no impact. See updates in my segment-recording branch.

Some observations

Filename Size (kB) Timestamp Duration (s) Media duration (s) Media bitrate (kB/s) Media fps
1636618338883908196_232756265.mp4 380477 0 722 4211 25
1636619058001338355_719350186424.mp4 340706 719 722 3771 22
1636619777336027504_1438684875573.mp4 380712 720 722 4212 25
1636620496660523397_2158009371466.mp4 341048 719 722 3775 22
1636621215983190898_2877332038967.mp4 380545 719 722 4212 25
1636621935094173340_3596443021409.mp4 341844 719 722 3784 22
1636622654491798247_4315840646316.mp4 380751 720 722 4213 25
1636623373795373239_5035144221308.mp4 341364 719 722 3779 22
1636624093114285064_5754463133133.mp4 380655 719 722 4212 25
1636624812443087234_6473791935303.mp4 341340 720 722 3779 22
1636625531766164366_7193115012435.mp4 380577 719 722 4211 25
1636626251112398016_7912461246085.mp4 341281 719 722 3778 22
1636626970414391654_8631763239723.mp4 380469 720 722 4210 25
1636627689791666084_9351140514153.mp4 341434 719 722 3781 22
1636628408817148356_10070165996425.mp4 380779 719 722 4213 25
1636629128157455222_10789506303291.mp4 341037 720 722 3775 22
1636629847474446477_11508823294546.mp4 380689 719 722 4212 25
1636630566788147103_12228136995172.mp4 340946 719 722 3774 22
1636631286113319319_12947462167388.mp4 380452 719 722 4210 25
1636632005459057842_13666807905911.mp4 340939 720 722 3774 22
1636632724778397406_14386127245475.mp4 380500 719 722 4210 25
1636633444099258579_15105448106648.mp4 341043 719 722 3777 22
1636634163203133582_15824551981651.mp4 380768 720 722 4213 25
1636634882549117852_16543897965921.mp4 341668 719 722 3782 22
1636635601858039299_17263206887368.mp4 380751 719 722 4213 25
1636636321172613343_17982521461412.mp4 341065 720 722 3775 22
1636637040405169345_18701754017414.mp4 380553 719 722 4211 25
1636637759610634328_19420959482397.mp4 341060 719 722 3775 22
1636638478823550987_20140172399056.mp4 380638 719 722 4212 25
1636639198028000688_20859376848757.mp4 340786 719 722 3772 22
1636639917292769018_21578641617087.mp4 380637 720 722 4213 25
1636640636220468585_22297569316654.mp4 341325 719 722 3778 22
1636641355428228558_23016777076627.mp4 380757 719 722 4213 25
1636642074641490641_23735990338710.mp4 341485 719 722 3780 22
1636642793847485043_24455196333112.mp4 380645 719 722 4212 25
1636643513050932900_25174399780969.mp4 340772 719 722 3772 22
1636644232259337961_25893608186030.mp4 380712 720 722 4212 25
1636644951465050307_26612813898376.mp4 340751 719 722 3772 22
1636645670677826494_27332026674563.mp4 380545 719 722 4212 25
1636646389671581207_28051020429276.mp4 341581 719 722 3782 22
1636647108877709971_28770226558040.mp4 380751 719 722 4213 25
whbruce commented 2 years ago

I repeated the above experiment with 35 minute recordings for 9 hours, again with built-in RTSP simulator. No errors were detected and video file segment sizes were consistent (just over 100Mbytes).

I advise the following next steps

  1. Using advice from @nnshah1 try and debug what is causing the pipeline to stop. The mqtt message is not the cause of the stop, it is a symptom. The message indicates the pipeline has stopped and the publisher detects an error when trying to disconnect from the broker.
  2. Uneven recording times may be related to issues with the incoming video stream. Repeat your tests with the built in RTSP simulator or try a different make/model of RTSP camera.
divdaisymuffin commented 2 years ago

@nnshah1 and @whbruce, I have tried the suggestion of increasing storage limit of docker container of analytics pod from 500Mi to 9000Mi, it resolved my error of pipeline stopped with mqtt disconnect issue and when I went inside kubernetes pod of analytics I can see recording files present but as we know that in next step the mp4 files move fom /tmp/rec of analytics pod to /var/www/mp4/ and later uploaded to GUI. But the new issue is at /var/www/mp4/ no recording is present.

The change I made in analytics.yaml.m4 `volumes:

So, now do I need to increase storage pod memory limit as well?

@whbruce, as its working fine for you can you provide me your analytics.yaml.m4 file and office-storage.yaml.m4 as well as cloud-storage.yaml.m4.

whbruce commented 2 years ago

Glad to hear the pipeline is no longer aborting. Can you close this issue and open a new one for moving the mp4 files as this is not related to pipeline operation.

For the m4 files I used, see my segment-recording branch.

divdaisymuffin commented 2 years ago

@whbruce @nnshah1 The issue is been created please check it https://github.com/OpenVisualCloud/Smart-City-Sample/issues/798

whbruce commented 2 years ago

Thanks for creating #798. Please close this issue, or let us know what still needs to be addressed.