PaddlePaddle / FastDeploy

⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end optimization, multi-platform and multi-framework support.
https://www.paddlepaddle.org.cn/fastdeploy
Apache License 2.0
2.98k stars 463 forks source link

ocr example运行报错 #1467

Closed yywangfei closed 6 months ago

yywangfei commented 1 year ago

示例的图片,12.jpg可以正常运行,但是自测了一张文档,报错rec模型和cls模型batch_size太小。之后将两个模型的batch_size调整成256之后,报显存太小,因此之后对det模型的后处理进行调整,将det_box切割成batch_size=5去预测,发现预测的速度非常慢,一张普通图片的时间要5s,显卡配置为rtx3090ti,请问下想提高性能怎么解决?

修改之后的/data/FastDeploy-develop/examples/vision/ocr/PP-OCRv3/serving/models/det_postprocess/1/model.py内容,从228行之后如下:

for i in range(0, len(image_list), predict_size):
    image_list_batch = image_list[i:i+predict_size]
    rec_texts_, rec_scores_, cls_labels_, cls_scores_ = self.predict_batch_images(image_list_batch)
    # print('jer, so,why', rec_texts_, type(rec_texts_))
    rec_texts += rec_texts_.tolist()
    rec_scores += rec_scores_.tolist()
    cls_labels += cls_labels_.tolist()
    cls_scores += cls_scores_.tolist()

下面为对box拆分为小batch预测的代码:

def predict_batch_images(self, image_list):
        cls_labels = []
        cls_scores = []
        rec_texts = []
        rec_scores = []

        cls_pre_tensors = self.cls_preprocessor.run(image_list)
        cls_dlpack_tensor = cls_pre_tensors[0].to_dlpack()
        # print(type(cls_dlpack_tensor), 'haha')
        cls_input_tensor = pb_utils.Tensor.from_dlpack(
            "x", cls_dlpack_tensor)

        inference_request = pb_utils.InferenceRequest(
            model_name='cls_pp',
            requested_output_names=['cls_labels', 'cls_scores'],
            inputs=[cls_input_tensor])
        inference_response = inference_request.exec()
        if inference_response.has_error():
            raise pb_utils.TritonModelException(
                inference_response.error().message())
        else:
            # Extract the output tensors from the inference response.
            cls_labels = pb_utils.get_output_tensor_by_name(
                inference_response, 'cls_labels')
            cls_labels = cls_labels.as_numpy()

            cls_scores = pb_utils.get_output_tensor_by_name(
                inference_response, 'cls_scores')
            cls_scores = cls_scores.as_numpy()

        for index in range(len(image_list)):
            if cls_labels[index] == 1 and cls_scores[
                    index] > self.cls_threshold:
                image_list[index] = cv2.rotate(
                    image_list[index].astype(np.float32), 1)
                image_list[index] = np.astype(np.uint8)

        rec_pre_tensors = self.rec_preprocessor.run(image_list)
        rec_dlpack_tensor = rec_pre_tensors[0].to_dlpack()
        rec_input_tensor = pb_utils.Tensor.from_dlpack(
            "x", rec_dlpack_tensor)

        inference_request = pb_utils.InferenceRequest(
            model_name='rec_pp',
            requested_output_names=['rec_texts', 'rec_scores'],
            inputs=[rec_input_tensor])
        inference_response = inference_request.exec()
        if inference_response.has_error():
            raise pb_utils.TritonModelException(
                inference_response.error().message())
        else:
            # Extract the output tensors from the inference response.
            rec_texts = pb_utils.get_output_tensor_by_name(
                inference_response, 'rec_texts')
            rec_texts = rec_texts.as_numpy()

            rec_scores = pb_utils.get_output_tensor_by_name(
                inference_response, 'rec_scores')
            rec_scores = rec_scores.as_numpy()
        return rec_texts, rec_scores, cls_labels, cls_scores
jiangjiajun commented 1 year ago

文档中文字太多,识别速度是会比较慢

yywangfei commented 1 year ago

文档中文字太多,识别速度是会比较慢

为什么这么慢呢,有什么改进的方法吗?我用pdserving部署的ocr服务大概0.8s,差距太大了

yunyaoXYY commented 1 year ago

你好,麻烦把预测的图片提供一下

yywangfei commented 1 year ago

你好,麻烦把预测的图片提供一下

1b069d07-b64f-424a-a018-5c9fe537284f 配置参数都是用的默认参数,只是把识别的后处理按照issue中写的代码做了调整,预测小batch,最后把结果进行合并,显卡是rtx3090ti