yolov8-pose tritonserver infer

minjea1588 commented 12 months ago

hello! Thank you for making a good source.

I am running yolov8-pose with tritonserver and running pose_src_pad_buffer_probe function, and I confirmed that an error occurs in this part out[..., :4] = map_to_zero_one(out[..., :4]). Is there any solution?

YunghuiHsu commented 12 months ago

Hi minjea1588:

I'm gonna need more information from you to debug this.

minjea1588 commented 12 months ago

ubuntu18.04 triton docker : nvcr.io/nvidia/tritonserver:21.10-py3 deepstream docker : nvcr.io/nvidia/deepstream:6.0-triton

triton config.pbtxt config.txt

deepstream config dstest_yolo_nopostprocess_v8_pose_triton.txt

`def pose_src_pad_buffer_probe(pad, info, u_data): t = time.time()

frame_number = 0
num_rects = 0

gst_buffer = info.get_buffer()
if not gst_buffer:
    print("Unable to get GstBuffer ")
    return

batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))

l_frame = batch_meta.frame_meta_list
while l_frame is not None:
    try:
        frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
    except StopIteration:
        break

    frame_number = frame_meta.frame_num
    num_rects = frame_meta.num_obj_meta
    pad_index = frame_meta.pad_index
    l_usr = frame_meta.frame_user_meta_list

    while l_usr is not None:
        try:
            # Casting l_obj.data to pyds.NvDsUserMeta
            user_meta = pyds.NvDsUserMeta.cast(l_usr.data)
        except StopIteration:
            break

        # get tensor output
        if (user_meta.base_meta.meta_type !=
                pyds.NvDsMetaType.NVDSINFER_TENSOR_OUTPUT_META):  # NVDSINFER_TENSOR_OUTPUT_META
            try:
                l_usr = l_usr.next
            except StopIteration:
                break
            continue

        try:
            tensor_meta = pyds.NvDsInferTensorMeta.cast(
                user_meta.user_meta_data)

            assert tensor_meta.num_output_layers == 1, f'Check number of model output layer : {tensor_meta.num_output_layers}'
            # layer_output_info = layers_info[0]
            layer_output_info = pyds.get_nvds_LayerInfo(tensor_meta, 0)  # as num_output_layers == 1

            network_info = tensor_meta.network_info
            input_shape = (network_info.width, network_info.height)

            if frame_number == 0 :
                print(f'\tmodel input_shape  : {input_shape}')

            # remove zeros from both ends of the array. 'b' : 'both'
            dims = np.trim_zeros(layer_output_info.inferDims.d, 'b')

            if frame_number == 0 :
                print(f'\tModel output dimension from LayerInfo: {dims}')

                output_message = f'\tCheck model output shape: {layer_output_info.inferDims.numElements}, '
                output_message += f'given OUT_SHAPE : {dims}'
                assert layer_output_info.inferDims.numElements == np.prod(dims), output_message

            # load float* buffer to python
            cdata_type = data_type_map[layer_output_info.dataType]
            ptr = ctypes.cast(pyds.get_ptr(layer_output_info.buffer), 
                              ctypes.POINTER(cdata_type))
            # Determine the size of the array
            out = np.ctypeslib.as_array(ptr, shape=dims)

            if frame_number == 0 :
                print(f'\tLoad Model Output From LayerInfo. Output Shape : {out.shape}')
            # [Optional] Postprocess for YOLOv7-pose(with YoloLayer_TRT_v7.0 Layer) prediction tensor
            # (https://github.com/nanmi/yolov7-pose/)
            # (57001, 1, 1) > (57000, 1, 1) > (1000, 57)
            # out = out[1:, ...].reshape(-1 , 57)   # or out.squeeze()[1:].reshape(-1 , 57)
            # ----------------------------------------------------------------------------------------------------------------------

            #  Explicitly specify batch dimensions
            if np.ndim(out) < 3:
                out = out[np.newaxis, :]
                # print(f'add axis 0 for model output : {out.shape}')

            # [Optional] Postprocess for yolov8-pose prediction tensor
            # (https://github.com/triple-Mu/YOLOv8-TensorRT/tree/triplemu/pose-infer)
            # 　(batch, 56, 8400)　＞(batch, 8400, 56) for yolov8
            out = out.transpose((0, 2, 1))

            # out = map_to_zero_one_copy(out)
            # make pseudo class prob
            cls_prob = np.ones((out.shape[0], out.shape[1], 1), dtype=np.uint8)
            out[..., :4] = map_to_zero_one(out[..., :4])  # scalar prob to [0, 1]
            # insert pseudo class prob into predictions
            out = np.concatenate((out[..., :5], cls_prob, out[..., 5:]), axis=-1)
            out[..., [0, 2]] = out[..., [0, 2]] * network_info.width  # scale to screen width
            out[..., [1, 3]] = out[..., [1, 3]] * network_info.height  # scale to screen height
            # ----------------------------------------------------------------------------------------------------------------------

            output_shape = (MUXER_OUTPUT_HEIGHT, MUXER_OUTPUT_WIDTH)
            if frame_number == 0 :
                print(f'\tModel output : {out.shape}, The coordinates of the keypoint are rescaled to (h, w) : {output_shape}')
            print("out : ", out.shape)
            pred = postprocess(out, output_shape, input_shape,
                               conf_thres=conf_thres, iou_thres=iou_thres)
            boxes, confs, kpts = pred
            # print("boxex, confs ", boxes, confs)
            if len(boxes) > 0 and len(confs) > 0 and len(kpts) > 0:
                add_obj_meta(frame_meta, batch_meta, boxes[0], confs[0])
                dispaly_frame_pose(frame_meta, batch_meta,
                                   boxes[0], confs[0], kpts[0])

        except StopIteration:
            break

        try:
            l_usr = l_usr.next
        except StopIteration:
            break

    # update frame rate through this probe
    stream_index = "stream{0}".format(frame_meta.pad_index)
    global perf_data
    perf_data.update_fps(stream_index)

    try:
        # indicate inference is performed on the frame
        frame_meta.bInferDone = True
        l_frame = l_frame.next
    except StopIteration:
        break

return Gst.PadProbeReturn.OK`

When using nvinfer, it works normally, but when using triton server, the error occurs and does not proceed to the next frame.

YunghuiHsu commented 12 months ago

I guess something wrong in dstest_yolo_nopostprocess_v8_pose_triton.txt:

Maybe should not use 'custom_lib'

  postprocess {
    labelfile_path: "/opt/nvidia/deepstream/deepstream-6.0/sources/deepstream_python_apps/apps/deepstream-imagedata-multistream/labels.txt"
    other {}
  }
  extra {
    copy_input_to_host_buffers: false
  }

  custom_lib {
    path: "/opt/nvidia/deepstream/deepstream-6.0/sources/libs/nvdsinfer_customparser/libnvds_infercustomparser.so"
  }

And Check shape of output array like this It should be "(batch, 56, 8400)"

            # 　(batch, 56, 8400)　＞(batch, 8400, 56) for yolov8
            print(f'out.shape : {out.shape}') 
            out = out.transpose((0, 2, 1))

Also check scalar in def map_to_zero_one too

def map_to_zero_one(scalar):
    print('[map_to_zero_one]  scalar\n   {scalar}')
    print('[map_to_zero_one]  scalar.shape :  {scalar.shape}')
    scalar_min = np.min(scalar)
    scalar_max = np.max(scalar)
    mapped = (scalar - scalar_min) / (scalar_max - scalar_min)
    return mapped

minjea1588 commented 12 months ago

I deleted custom_lib in dstest_yolo_nopostprocess_v8_pose_triton.txt, but the same problem still appears. The shape before transposing is (1, 56, 8400). [map_to_zero_one] scalar.shape: (1, 8400, 4). But still the same problem is appearing.

YunghuiHsu commented 12 months ago

your shape of output and scalar seems correct.

The error message "overflow encountered in subtract" indicates that the operation scalar − scalar_min is causing an overflow, likely because the values in scalar or scalar_min are too large or too small to fit within the data type.

Check Data Types: Ensure that the data type of scalar can handle the range of values you're dealing with. If you're using NumPy arrays, you can check the data type with scalar.dtype.
Check Values: Print or log the minimum and maximum values within scalar. This will help you understand if there are any extremely large or small values that could be causing the overflow.

import numpy as np

def map_to_zero_one(scalar):
    print(f'[map_to_zero_one]  scalar.shape :  {scalar.shape}')
    print(f'[map_to_zero_one]  scalar.dtype :  {scalar.dtype}')

    scalar_min = np.min(scalar)
    scalar_max = np.max(scalar)

    print(f'[map_to_zero_one]  scalar_min :  {scalar_min}')
    print(f'[map_to_zero_one]  scalar_max :  {scalar_max}')

    mapped = (scalar - scalar_min) / (scalar_max - scalar_min)
    return mapped

minjea1588 commented 12 months ago

It seems that the problem occurs because nan is entered in the scalar value.

YunghuiHsu commented 12 months ago

out[..., :4] means coordinates of bounding box.

I didn't encounter this problem when I used NVInfer instead of NVInferserver.

Try use np.nanmin 、np.nanmax replace np.min、np.max in def map_to_zero_one(scalar)

This may solve the problems encountered during the normalization stage of the bbox coordinates, but the underlying reason is that the bbox coordinates should not theoretically have NaN, and perhaps other problems will be encountered in the post-processing stage.

minjea1588 commented 12 months ago

thank you for the reply. It seems that nvinfer, external tritonserver, and deepstreamtriton all have different data structures. I'll have to find another way to preprocess it.

YunghuiHsu commented 12 months ago

Good luck in finding a solution.Please let me know if you figure out the correct way to set up NVInferserver, if it's convenient for you, thank you!

minjea1588 commented 11 months ago

It's been a while. I found it. When infernece with triton, the network_info.width and network_info.height values are 0,0. If change this part to 640 640, it will work!

YunghuiHsu commented 11 months ago

It's been a while. I found it. When infernece with triton, the network_info.width and network_info.height values are 0,0. If change this part to 640 640, it will work!

That's great. Congratulations.

I'm keeping this code specifically for debugging information and dynamic scaling.

minjea1588 commented 11 months ago

There is one more thing: when a 640x640 model is displayed on the screen at 1920x1080, it is displayed as such. 640x640 works fine. What processing should be done when operating at 1920x1080?

YunghuiHsu commented 11 months ago

Because the model is inferred to be 640x640, it has to be scaled to the size you want to display

Here is display and rescale related code: You might consider setting input_shape to a constant.


MUXER_OUTPUT_WIDTH = 640  # stream input
MUXER_OUTPUT_HEIGHT = 360  # stream input
TILED_OUTPUT_WIDTH = 1280 # stream output
TILED_OUTPUT_HEIGHT = 720 # stream output

def pose_src_pad_buffer_probe(pad, info, u_data):
                ...
                network_info = tensor_meta.network_info
                input_shape = (network_info.width, network_info.height)
                ...
                out[..., [0, 2]] = out[..., [0, 2]] * network_info.width  # scale to screen width
                out[..., [1, 3]] = out[..., [1, 3]] * network_info.height  # scale to screen height

YunghuiHsu / deepstream-yolo-pose

yolov8-pose tritonserver infer #5