Open Hongyuan-Liu opened 4 months ago
看一下reparameterize这个函数
我看到了,这个函数是提前给模型输入了文本信息,那么onnx推理的时候也能调用到这个函数吗?
调不到,得改一下代码,默认的onnx导出的模型输入只有图片,需要加上text提示才能实现你说的:
这个函数是提前给模型输入了文本信息,那么onnx推理的时候也能调用到这个函数
那就是在导出onnx的时候,fake_input包括图片、text,一块送进去进行导出了 torch.onnx.export( deploy_model, fake_input, f, input_names=['images'], output_names=output_names, opset_version=args.opset) 但为什么官方的导出方式不这么做呢?
Hi @Hongyuan-Liu, currently, we only support exporting ONNX with a customized vocabulary for downstream detection tasks without training. Supporting text prompts for the ONNX model requires the text encoder. Ideally, we can export a larger ONNX model with a text encoder, but currently, it's not in the plan.
@Hongyuan-Liu, you can try to export the ONNX model with CLIP. There are some open-source works which export CLIP to ONNX. We would appreciate it if you finish it with a kind pull request.
根据你的所述,我理解的是,这个工程只是做特定语义的检测,也就是所说的自定义语义检测,当要部署的时候,把自定的所有语义都通过reparameterize加载进去,然后导出onnx,这样onnx模型就支持了所定义的语义检测了,是这样吗? 那如果是我理解的这样的话,我自己指定自定义的语义,而不限于 coco_class_texts.json,lvis_v1_base_class_captions.json ,lvis_v1_class_texts.json, obj365v1_class_texts.json这些定义好的,理论上也是可以的是吗 ?
是的,一般检测器训练多少类就只能检测多少类,部署也只能检测这些类。对于YOLO-World,你现在可以指定你想要检测的类别(custom vocabulary),然后reparameterize进模型,转ONNX后部署。custom vocabulary可以不限于这些json的类别,可以自己写一个json来自定义任何类别。
好的,谢谢
Hi @Hongyuan-Liu, currently, we only support exporting ONNX with a customized vocabulary for downstream detection tasks without training. Supporting text prompts for the ONNX model requires the text encoder. Ideally, we can export a larger ONNX model with a text encoder, but currently, it's not in the plan.
for this model: yolo_world_l_t2i_bn_2e-4_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py,the text encoder is
text_model=dict(
type='HuggingCLIPLanguageBackbone',
model_name='openai/clip-vit-base-patch32',
frozen_modules=['all']))
this seem not difficult to add this model to onnx. Is it because of the less importance of this feature or the size of the model that it was not in plan? The inference time is also one reason. But this will align with the provided demo: you can specify the text for inference and change it in next inference, like this, so can be more dynamic:
我用如下命令导出onnx模型: python deploy/export_onnx.py \ ./configs/pretrain/yolo_world_v2_l_vlpan_bn_2e-3_100e_4x8gpus_obj365v1_goldg_train_lvis_minival.py \ ./weights/yolo_world_v2_l_obj365v1_goldg_cc3mlite_pretrain-ca93cd1f.pth \ --custom-text data/texts/obj365v1_class_texts.json \ --opset 12
然后推理代码如下: import onnxruntime import numpy as np import cv2 import copy
def letterbox(src_image, dst_width=640, dst_height=640, color=114): src_height, srcwidth, = src_image.shape padding_x = 0 padding_y = 0 scale = 1.0
def convert_result(src_image, boxes, offset_x, offset_y, scale_val): src_width = src_image.shape[1] src_height = src_image.shape[0]
def draw_result(src_image, labels, bboxes, scores): src_image_h, src_imagew, = src_image.shape for label, box, score in zip(labels, bboxes, scores): if label == -1: continue x1, y1, x2, y2 = list(map(int, box)) np.random.seed(int(label) + 2000) box_color = (np.random.randint(0, 255), np.random.randint(0, 255), np.random.randint(0, 255)) cv2.rectangle(src_image, (x1, y1), (x2, y2), box_color, max(int((src_image_w + src_image_h) / 1000), 2), cv2.LINE_AA) content = str(label) + ' ' + '{0:.3f}'.format(score) font_scale = round(0.002 * ((x2 - x1) + (y2 - y1)) / 2) + 1 text_size = cv2.getTextSize(content, 0, fontScale=font_scale / 3, thickness=1)[0] cv2.rectangle(src_image, (x1 + 2, y1 + 2), (x1 + text_size[0] + 3, y1 + text_size[1] + 5), (0, 0, 0), cv2.FILLED, cv2.LINE_AA) cv2.putText(src_image, content, (x1 + 1, y1 + 16), 0, font_scale / 3, [255, 255, 255], thickness=1, lineType=cv2.LINE_AA)
if name == 'main':
onnx_file = 'yolow-l.onnx'
图像结果:
我想知道,如何通过指定text,进行指定类别的检测,而不是输出所有的类别的结果,是否现在官方onnx的导出只支持所有类别作为输出呢?在实际项目中这种方式该如何应用,是对结果进行过滤,来获取指定类别的结果吗?