problem when BATCH_SIZE > 1

I am trying make it work with batch size > 1 Actual version i am working with is yolov5, 3.1 with 23 classes, 640x640 Device is jetson nano

Tensorrtx tests For reason that 3.1 is changed a little i took actual version of tensorrtx without hswish

tensorrtx was build with 23 classes
when tested tensorrtx with batch size 1 works fine
i rebuild tensorrtx with batch size 8 and regenerated engine file with max batch size = 8
tested with tensorrtx inference ( yolov5 -d ../samples ) - results are ok ( ... 23 classes, batchsize 8, 640x640 )

so on jetson nano on JETSON_CUDA=10.2.89 works fine

deepstream 5.0 nvdsinfer_custom_impl_Yolo on jetson nano on JETSON_CUDA=10.2.89 i configured yolov5s for deepstream 5.0

when tested with batch size 1 engine, it works (but ocasionally "boxes explodes" ) - looks like memory is not cleared between cycles (but just my assumption)
when tested with engine with max batch size = 8 and setting batch size even 1 "boxes explodes" very often. Tracking has no affect to this functionality - in test it was turned off.
With engine (batch size =1) there is also strange behavior with 2 streams in parallel - in deepstream. First stream is correct, second one has "exploded boxes" behavior. So i think problem is related to this one too.

Examples how explosion looks like explode

Expected (sometimes) sometimes ok

DanaHan / Yolov5-in-Deepstream-5.0