YunghuiHsu / deepstream-yolo-pose

Use Deepstream python API to extract the model output tensor and customize the post-processing of YOLO-Pose
https://hackmd.io/JQAXmJzuTyW22-x3k-0Zvw
Apache License 2.0
57 stars 14 forks source link

yolov8-pose tritonserver infer #5

Closed minjea1588 closed 11 months ago

minjea1588 commented 12 months ago

hello! Thank you for making a good source.

I am running yolov8-pose with tritonserver and running pose_src_pad_buffer_probe function, and I confirmed that an error occurs in this part out[..., :4] = map_to_zero_one(out[..., :4]). Is there any solution?

YunghuiHsu commented 12 months ago

Hi minjea1588:

I'm gonna need more information from you to debug this.

minjea1588 commented 12 months ago

ubuntu18.04 triton docker : nvcr.io/nvidia/tritonserver:21.10-py3 deepstream docker : nvcr.io/nvidia/deepstream:6.0-triton

triton config.pbtxt config.txt

deepstream config dstest_yolo_nopostprocess_v8_pose_triton.txt

`def pose_src_pad_buffer_probe(pad, info, u_data): t = time.time()

frame_number = 0
num_rects = 0

gst_buffer = info.get_buffer()
if not gst_buffer:
    print("Unable to get GstBuffer ")
    return

batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))

l_frame = batch_meta.frame_meta_list
while l_frame is not None:
    try:
        frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
    except StopIteration:
        break

    frame_number = frame_meta.frame_num
    num_rects = frame_meta.num_obj_meta
    pad_index = frame_meta.pad_index
    l_usr = frame_meta.frame_user_meta_list

    while l_usr is not None:
        try:
            # Casting l_obj.data to pyds.NvDsUserMeta
            user_meta = pyds.NvDsUserMeta.cast(l_usr.data)
        except StopIteration:
            break

        # get tensor output
        if (user_meta.base_meta.meta_type !=
                pyds.NvDsMetaType.NVDSINFER_TENSOR_OUTPUT_META):  # NVDSINFER_TENSOR_OUTPUT_META
            try:
                l_usr = l_usr.next
            except StopIteration:
                break
            continue

        try:
            tensor_meta = pyds.NvDsInferTensorMeta.cast(
                user_meta.user_meta_data)

            assert tensor_meta.num_output_layers == 1, f'Check number of model output layer : {tensor_meta.num_output_layers}'
            # layer_output_info = layers_info[0]
            layer_output_info = pyds.get_nvds_LayerInfo(tensor_meta, 0)  # as num_output_layers == 1

            network_info = tensor_meta.network_info
            input_shape = (network_info.width, network_info.height)

            if frame_number == 0 :
                print(f'\tmodel input_shape  : {input_shape}')

            # remove zeros from both ends of the array. 'b' : 'both'
            dims = np.trim_zeros(layer_output_info.inferDims.d, 'b')

            if frame_number == 0 :
                print(f'\tModel output dimension from LayerInfo: {dims}')

                output_message = f'\tCheck model output shape: {layer_output_info.inferDims.numElements}, '
                output_message += f'given OUT_SHAPE : {dims}'
                assert layer_output_info.inferDims.numElements == np.prod(dims), output_message

            # load float* buffer to python
            cdata_type = data_type_map[layer_output_info.dataType]
            ptr = ctypes.cast(pyds.get_ptr(layer_output_info.buffer), 
                              ctypes.POINTER(cdata_type))
            # Determine the size of the array
            out = np.ctypeslib.as_array(ptr, shape=dims)

            if frame_number == 0 :
                print(f'\tLoad Model Output From LayerInfo. Output Shape : {out.shape}')
            # [Optional] Postprocess for YOLOv7-pose(with YoloLayer_TRT_v7.0 Layer) prediction tensor
            # (https://github.com/nanmi/yolov7-pose/)
            # (57001, 1, 1) > (57000, 1, 1) > (1000, 57)
            # out = out[1:, ...].reshape(-1 , 57)   # or out.squeeze()[1:].reshape(-1 , 57)
            # ----------------------------------------------------------------------------------------------------------------------

            #  Explicitly specify batch dimensions
            if np.ndim(out) < 3:
                out = out[np.newaxis, :]
                # print(f'add axis 0 for model output : {out.shape}')

            # [Optional] Postprocess for yolov8-pose prediction tensor
            # (https://github.com/triple-Mu/YOLOv8-TensorRT/tree/triplemu/pose-infer)
            #  (batch, 56, 8400) >(batch, 8400, 56) for yolov8
            out = out.transpose((0, 2, 1))

            # out = map_to_zero_one_copy(out)
            # make pseudo class prob
            cls_prob = np.ones((out.shape[0], out.shape[1], 1), dtype=np.uint8)
            out[..., :4] = map_to_zero_one(out[..., :4])  # scalar prob to [0, 1]
            # insert pseudo class prob into predictions
            out = np.concatenate((out[..., :5], cls_prob, out[..., 5:]), axis=-1)
            out[..., [0, 2]] = out[..., [0, 2]] * network_info.width  # scale to screen width
            out[..., [1, 3]] = out[..., [1, 3]] * network_info.height  # scale to screen height
            # ----------------------------------------------------------------------------------------------------------------------

            output_shape = (MUXER_OUTPUT_HEIGHT, MUXER_OUTPUT_WIDTH)
            if frame_number == 0 :
                print(f'\tModel output : {out.shape}, The coordinates of the keypoint are rescaled to (h, w) : {output_shape}')
            print("out : ", out.shape)
            pred = postprocess(out, output_shape, input_shape,
                               conf_thres=conf_thres, iou_thres=iou_thres)
            boxes, confs, kpts = pred
            # print("boxex, confs ", boxes, confs)
            if len(boxes) > 0 and len(confs) > 0 and len(kpts) > 0:
                add_obj_meta(frame_meta, batch_meta, boxes[0], confs[0])
                dispaly_frame_pose(frame_meta, batch_meta,
                                   boxes[0], confs[0], kpts[0])

        except StopIteration:
            break

        try:
            l_usr = l_usr.next
        except StopIteration:
            break

    # update frame rate through this probe
    stream_index = "stream{0}".format(frame_meta.pad_index)
    global perf_data
    perf_data.update_fps(stream_index)

    try:
        # indicate inference is performed on the frame
        frame_meta.bInferDone = True
        l_frame = l_frame.next
    except StopIteration:
        break

return Gst.PadProbeReturn.OK`

image

When using nvinfer, it works normally, but when using triton server, the error occurs and does not proceed to the next frame.

YunghuiHsu commented 12 months ago

I guess something wrong in dstest_yolo_nopostprocess_v8_pose_triton.txt:

Maybe should not use 'custom_lib'

  postprocess {
    labelfile_path: "/opt/nvidia/deepstream/deepstream-6.0/sources/deepstream_python_apps/apps/deepstream-imagedata-multistream/labels.txt"
    other {}
  }
  extra {
    copy_input_to_host_buffers: false
  }

  custom_lib {
    path: "/opt/nvidia/deepstream/deepstream-6.0/sources/libs/nvdsinfer_customparser/libnvds_infercustomparser.so"
  }

And Check shape of output array like this It should be "(batch, 56, 8400)"

            #  (batch, 56, 8400) >(batch, 8400, 56) for yolov8
            print(f'out.shape : {out.shape}') 
            out = out.transpose((0, 2, 1))

Also check scalar in def map_to_zero_one too

def map_to_zero_one(scalar):
    print('[map_to_zero_one]  scalar\n   {scalar}')
    print('[map_to_zero_one]  scalar.shape :  {scalar.shape}')
    scalar_min = np.min(scalar)
    scalar_max = np.max(scalar)
    mapped = (scalar - scalar_min) / (scalar_max - scalar_min)
    return mapped
minjea1588 commented 12 months ago

I deleted custom_lib in dstest_yolo_nopostprocess_v8_pose_triton.txt, but the same problem still appears. The shape before transposing is (1, 56, 8400). [map_to_zero_one] scalar.shape: (1, 8400, 4). But still the same problem is appearing.

image

YunghuiHsu commented 12 months ago

your shape of output and scalar seems correct.

The error message "overflow encountered in subtract" indicates that the operation scalar − scalar_min is causing an overflow, likely because the values in scalar or scalar_min are too large or too small to fit within the data type.

import numpy as np

def map_to_zero_one(scalar):
    print(f'[map_to_zero_one]  scalar.shape :  {scalar.shape}')
    print(f'[map_to_zero_one]  scalar.dtype :  {scalar.dtype}')

    scalar_min = np.min(scalar)
    scalar_max = np.max(scalar)

    print(f'[map_to_zero_one]  scalar_min :  {scalar_min}')
    print(f'[map_to_zero_one]  scalar_max :  {scalar_max}')

    mapped = (scalar - scalar_min) / (scalar_max - scalar_min)
    return mapped
minjea1588 commented 12 months ago

image It seems that the problem occurs because nan is entered in the scalar value.

YunghuiHsu commented 12 months ago

out[..., :4] means coordinates of bounding box.

I didn't encounter this problem when I used NVInfer instead of NVInferserver.

Try use np.nanminnp.nanmax replace np.minnp.max in def map_to_zero_one(scalar)

This may solve the problems encountered during the normalization stage of the bbox coordinates, but the underlying reason is that the bbox coordinates should not theoretically have NaN, and perhaps other problems will be encountered in the post-processing stage.

minjea1588 commented 12 months ago

thank you for the reply. It seems that nvinfer, external tritonserver, and deepstreamtriton all have different data structures. I'll have to find another way to preprocess it.

YunghuiHsu commented 12 months ago

Good luck in finding a solution.Please let me know if you figure out the correct way to set up NVInferserver, if it's convenient for you, thank you!

minjea1588 commented 11 months ago

It's been a while. I found it. When infernece with triton, the network_info.width and network_info.height values ​​are 0,0. If change this part to 640 640, it will work! image

YunghuiHsu commented 11 months ago

It's been a while. I found it. When infernece with triton, the network_info.width and network_info.height values ​​are 0,0. If change this part to 640 640, it will work! image

That's great. Congratulations.

I'm keeping this code specifically for debugging information and dynamic scaling.

minjea1588 commented 11 months ago

image

There is one more thing: when a 640x640 model is displayed on the screen at 1920x1080, it is displayed as such. 640x640 works fine. What processing should be done when operating at 1920x1080?

YunghuiHsu commented 11 months ago

Because the model is inferred to be 640x640, it has to be scaled to the size you want to display

Here is display and rescale related code: You might consider setting input_shape to a constant.


MUXER_OUTPUT_WIDTH = 640  # stream input
MUXER_OUTPUT_HEIGHT = 360  # stream input
TILED_OUTPUT_WIDTH = 1280 # stream output
TILED_OUTPUT_HEIGHT = 720 # stream output

def pose_src_pad_buffer_probe(pad, info, u_data):
                ...
                network_info = tensor_meta.network_info
                input_shape = (network_info.width, network_info.height)
                ...
                out[..., [0, 2]] = out[..., [0, 2]] * network_info.width  # scale to screen width
                out[..., [1, 3]] = out[..., [1, 3]] * network_info.height  # scale to screen height