yghstill commented 2 years ago

PP-PicoDet是轻量级实时移动端目标检测模型，我们提出了从小到大的一系列模型，包括S、M、L等，超越现有SOTA模型。

模型特色：

🌟精度高：1M参数量以内mAP(0.5:0.95)达到30.6，3.3M参数量mAP(0.5:0.95)达到40.9。
🚀速度快：在SD865上达到150FPS。
😊部署友好：我们支持PaddleInference/PaddleLite/MNN/NCNN/OpenVINO，并且提供C++/Python/Android demo。

链接：

详细算法细节请参考paper：https://arxiv.org/abs/2111.00902
Readme&配置文件： https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/picodet

欢迎大家试用，有疑问欢迎讨论盖楼~

和其他模型对比：

FAQ汇总： （持续更新中）

版本要求： 训练导出模型要求Paddle版本统一，同时 PaddlePaddle >= 2.1.2。
学习率、GPU数和batch-size关系： 采用线性伸缩准则，发布的配置文件基本都是4卡GPU训练的，例如：变成单卡，请学习率除以4，如果batch size从80变成40，请学习率再除以2。
配置优先级： 一般picodet_x_coco.yml中的配置优先级高于__base__中配置，picodet_x_coco.yml中的所有设置会覆盖__base__中配置，所以修改picodet_x_coco.yml的配置即可。
在自己数据集上训练模型： 支持COCO和VOC两种数据格式，同时建议采用迁移学习加快收敛，具体步骤：从PicoDet的Readme中拷贝COCO上训好的pretrain weights链接，更新配置文件中pretrain_weights参数为COCO上训好的权重。

为了方便大家交流沟通，欢迎扫码添加微信群，继续交流有关PP-PicoDet的使用及建议~

yghstill commented 2 years ago

@songh11 具体是哪些OP不支持？采用什么方式解决的？我们关注下这个问题，看看能不能适配更多的硬件。

zwhua006 commented 2 years ago

请问如果我现在想直接修改推理时候的prepross,比如只使用letterbox，目前其他的操作都不使用，请问修改流程是怎么样的？因为我发现是不能直接改yaml的

然后我发现在postprocess的时候是用CPU进行nms的，使用的是numpy,并且在推理结束将结果搬到了CPU上，为什么不直接使用GPU进行postprocess呢？

yghstill commented 2 years ago

@zwhua006

预处理包括解码图片、resize、归一化和变换输入通道顺序这些操作。(不像YOLOv5使用了letterbox) 这些操作除了第一步解码可以从numpy中直接读取，其他的操作是必备的，缺一不可，请问为什么要修改这部分处理逻辑？
由于PicoDet后处理操作复杂，算子较多，同时为了适配更多硬件，所以后处理所有操作都搬到CPU上处理，测试发现比GPU上直接处理更优。

zwhua006 commented 2 years ago

因为我之前使用yolov5s，只需要letterbox就可以取得不错的效果，目前我想缩短前处理时间，然后因为归一化，变换通道顺序这些其实只需要两行代码就可以完成，而letterbox里面其实已经有了resize，所以我想要修改逻辑。同时后处理时间我想到的就是重写后处理部分，将其放到GPU处理。我尝试不在搬数据到cpu后，速度有了很大提高。只剩下前处理部分所占据的大量时间。因为我现在需要map和速度都超过yolov5s,目前mAP已经达到了要求，但是速度差距太大，可以接受速度损失，但是目前Picodet大概在35ms左右，而yolov5s只有9ms,两者0.5相差并不是特别大。

yghstill commented 2 years ago

@zwhua006 你是指在GPU上预测吗？前处理还有个加速方式，就是将归一化操作放在GPU上进行：具体的，configs/picodet/base/picodet_640_reader.yml的TestReader 中加上： fuse_normalize: true ，然后重新导出下模型，可以有少量加速。关于后处理的话，主要耗时还是nms处理，可以把score_threshold阈值调成0.4，可以加快后处理速度：https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.3/ppdet/engine/export_utils.py#L170

zwhua006 commented 2 years ago

是，我是使用GPU预测的。目前就是想缩短推理时间。 self.predictor.get_output_handle(output_names[out_idx]).copy_to_cpu()这一步操作大概占了inference的一半我尝试不进行搬运，但是提示模式错误，我尝试修改GPU版本的后处理，但是失败了，请问有GPU版本的后处理模块提供吗？阈值的影响实际上是等同的，因为我在对比的时候是保证了相同的阈值，对于nms的两个阈值的确可以加快速度。

yghstill commented 2 years ago

@zwhua006 删掉这一行：https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.3/ppdet/modeling/architectures/picodet.py#L44 再重新导出，可以使所有操作使用Paddle计算，然后在deploy预测时，使用普通Detector预测（不用DetectorPicoDet）

zwhua006 commented 2 years ago

感谢您的回复，在我遵循您的说法，直接删除self.deploy=False后出现导出错误'PicoDet' object has no attribute 'deploy'。之后我将picodet.py文件中所有的self.deploy都进行注释修改后，模型导出成功，但是在使用Detector进行预测的时候依旧出现错误 InvalidArgumentError: The tensor Input (Input) of Slice op is not initialized.[Hint: Expected in_tensor.IsInitialized() == true, but received in_tensor.IsInitialized():0 != true:1.] (at/paddle/paddle/fluid/operators/slice_op.cc:178) [operator < slice > error]

yghstill commented 2 years ago

@zwhua006 我刚试了下，还需要注释掉以下代码：https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.3/ppdet/engine/trainer.py#L631-L635 测试可以跑通，您再试下。

songh11 commented 2 years ago

@songh11 具体是哪些OP不支持？采用什么方式解决的？我们关注下这个问题，看看能不能适配更多的硬件。

hardsigmoid不支持，我用sigmoid做的替代，另外upsample中align_mode需要设置成1

zwhua006 commented 2 years ago

@yghstill 感谢您，测试是成功的，但是似乎并未执行postprocess results = self.postprocess(np_boxes, np_masks, inputs, np_boxes_num, threshold=threshold) def postprocess(self, np_boxes, np_masks, inputs, np_boxes_num, threshold=0.5):

postprocess output of predictor

    results = {}
    results['boxes'] = np_boxes
    results['boxes_num'] = np_boxes_num
    if np_masks is not None:
        results['masks'] = np_masks
    return results

如果是Detetor的话，并未执行nms过滤。我的本意是postprocess能否搬到gpu上运行，就不需要在推理结束后复制到cpu上，从而加快时间。目前我使用DetectorPicoDet的话推理时间大概是preprocess_time(ms): 2.40, inference_time(ms): 8.50,postprocess_time(ms): 4.80。但是我发现; input_names = self.predictor.get_input_names() for i in range(len(input_names)): input_tensor = self.predictor.get_input_handle(input_names[i]) input_tensor.copy_from_cpu(inputs[input_names[i]]) 在preprocess和infernce之间的这一步操作就花费了600ms,成为了瓶颈，似乎是对图片拷贝了一次.

lilith-zy commented 2 years ago

为了方便大家交流沟通，欢迎扫码添加微信群，继续交流有关PP-PicoDet的使用及建议~

niancheng commented 2 years ago

二维码过期了，麻烦重新发一个吧

lilith-zy commented 2 years ago

二维码过期了，麻烦重新发一个吧

好的，您可以尝试一下这个：

whisper2234 commented 2 years ago

为什么我用ppdet仿照快速体验直接用tools/infer.py识别demo里的图片识别不出来啊，也没有报错什么的

yghstill commented 2 years ago

@whisper2234 运行指令是什么呢？可能是模型权重没有加载正确~

whisper2234 commented 2 years ago

python tools/infer.py -c configs/picodet/picodet_s_320_coco.yml -o use_gpu=true weights=https://paddledet.bj.bcebos.com/models/pretrained/ESNet_x0_75_pretrained.pdparams --infer_img=demo/000000014439_640x640.jpg 用的预训练的权重，最后得到的还是原图，没有识别出来

yghstill commented 2 years ago

@whisper2234 weights加载backbone的预训练模型这样是不对的哦，请加载COCO上训练好的权重哈：weights=https://paddledet.bj.bcebos.com/models/picodet_s_320_coco.pdparams

Monday-Leo commented 2 years ago

使用windows平台测试了picodet ncnn部署方案，使用的是官方提供的picodet_m_416模型，发现精度较差，如下图，代码中使用的图像大小为（320，320），与模型不一致，但修改之后，检测框发生了明显错误，请问是哪里出了问题？

yghstill commented 2 years ago

@Monday-Leo 请使用picodet_m_320的模型再测试下呢？因为416的模型针对416输入附近尺寸训练得到的，320的尺寸没有覆盖到。

Monday-Leo commented 2 years ago

@yghstill 官方的ncnn代码就是用的416的模型，预测的时候用320尺寸输入，测试没有发现问题，只是精度比较差，如果我自己修改成416输入，输出的框完全不对，是否是官方模型命名错误？官方教程链接:https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/deploy/third_engine/demo_ncnn 官方ncnn模型链接:https://paddledet.bj.bcebos.com/deploy/third_engine/picodet_m_416_ncnn.zip

yghstill commented 2 years ago

@Monday-Leo 好的，我们定位下问题

Sharpiless commented 2 years ago

请问有完整的导出到android部署的教程吗

yghstill commented 2 years ago

@Sharpiless 有的

导出模型教程：https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/picodet#export-and-convert-model
Android demo：https://github.com/JiweiMaster/PP-PicoDet-Android-Demo

Monday-Leo commented 2 years ago

请问会出tensorrt的部署方案吗？

yghstill commented 2 years ago

请问会出tensorrt的部署方案吗？

@Monday-Leo PaddleDetection是支持TRT部署的，PicoDet目前支持trt7以上。https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/TENSOR_RT.md

czyczyczy commented 2 years ago

我在用ncnn框架调用picodet_s_320模型时，出现 “find_blob_index_by_name save_infer_model/scale_4.tmp_1 failed Try ex.extract("transpose_10.tmp_0", out0); ex.extract("transpose_11.tmp_0", out1); ex.extract("transpose_12.tmp_0", out2); ex.extract("transpose_13.tmp_0", out3); ex.extract("transpose_14.tmp_0", out4); ex.extract("transpose_15.tmp_0", out5); ex.extract("transpose_16.tmp_0", out6); ex.extract("transpose_17.tmp_0", out7); find_blob_index_by_name save_infer_model/scale_0.tmp_1 failed Try ex.extract("transpose_10.tmp_0", out0); ex.extract("transpose_11.tmp_0", out1); ex.extract("transpose_12.tmp_0", out2); ex.extract("transpose_13.tmp_0", out3); ex.extract("transpose_14.tmp_0", out4); ex.extract("transpose_15.tmp_0", out5); ex.extract("transpose_16.tmp_0", out6); ex.extract("transpose_17.tmp_0", out7);” 的错误，能问一下大概是哪里出了问题吗？

yghstill commented 2 years ago

@czyczyczy 是因为输出的name没有对应上，这里改成picodet_s_320实际的输出name就行，可以利用netron等可视化查看网络输出name。

czyczyczy commented 2 years ago

@czyczyczy 是因为输出的name没有对应上，这里改成picodet_s_320实际的输出name就行，可以利用netron等可视化查看网络输出name。

感谢你的回复，我现在还有一个问题是，我是用不同版本的openvino是，预测的速度有一倍以上的差距，openvino2021.1.110版本：33ms；openvino2021.2.185版本：14ms；能问一下具体是什么原因嘛

yghstill commented 2 years ago

@czyczyczy openvino版本升级会有性能优化，建议使用新版即可。

zhenhao-huang commented 2 years ago

微调需要多大的显存，我已经设置了export CUDA_VISIBLE_DEVICES=1，或者是export CUDA_VISIBLE_DEVICES=1,2,3,4，都是报如下错误：

Out of memory error on GPU 0. Cannot allocate 108.000244MB memory on GPU 0, 10.699219GB memory has been allocated and available memory is only 63.437500MB.

Please check whether there is any other process using GPU 0.
1. If yes, please stop them, or start PaddlePaddle on another GPU.
2. If no, please decrease the batch size of your model. 

(at /paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:79)
. (at /paddle/paddle/fluid/imperative/tracer.cc:221)

显卡是8卡11G

imchenmin commented 2 years ago

微信群二维码能再发一下吗？谢谢！

yghstill commented 2 years ago

微信群二维码能再发一下吗？谢谢！

@chenminken 已更新

yghstill commented 2 years ago

@zhenhao-huang 显存不足，你的显存较小，将batch size调的更小点吧

zhenhao-huang commented 2 years ago

@zhenhao-huang 显存不足，你的显存较小，将batch size调的更小点吧我将batch_size从128调到80，学习率需要怎么改呢

yghstill commented 2 years ago

@zhenhao-huang 学习率按照线性比例减小。

ppogg commented 2 years ago

那个二维码过期了哦

yghstill commented 2 years ago

@ppogg 二维码已更新~

ppogg commented 2 years ago

@ppogg 二维码已更新~

happy copy that

lilith-zy commented 2 years ago

PicoDet技术交流群二维码更新：

pureloveljc commented 2 years ago

自己训练的picodet模型　怎么转化成ncnn　的模型？？？

yghstill commented 2 years ago

自己训练的picodet模型　怎么转化成ncnn　的模型？？？

@pureloveljc 可以根据文档中部署部分导出onnx模型，然后使用在线工具转成ncnn格式即可：https://convertmodel.com/

terrancraft commented 2 years ago

您好，我启动picodet640l训练发现产生如下报错： Screenshot from 2022-01-17 11-09-55

yghstill commented 2 years ago

@terrancraft PaddlePaddle的版本是多少？是否是2.1.2以上版本？

terrancraft commented 2 years ago

@terrancraft PaddlePaddle的版本是多少？是否是2.1.2以上版本？

更新到2.2.0 可以训练了此外simOTA 的iou weight 一般经验上设为多少？谢谢！

yghstill commented 2 years ago

@terrancraft 保持配置文件中默认即可，这个我们经过实验，目前的比例是比较好的。

sdreamforchen commented 2 years ago

您好，如果我采用640640的onnx模型部署。我的输入图片必须是640640吗？模型本身能自动resize吗

yghstill commented 2 years ago

您好，如果我采用640_640的onnx模型部署。我的输入图片必须是640_640吗？模型本身能自动resize吗

@sdreamforchen 需要进行前处理，将图片resize到640，然后再输入到网络中计算。

zhouweic36 commented 2 years ago

我是在 windows GPU 训练小数据集（270张图，目标class_num=1）导出模型，再到ubuntu18.04 linux 下用paddlelite优化导出nb 模型，用paddlelite-generic-demo下ssd_detection_demo ，adb 到 rk1808 arm 上运行，相关资料为 mypicodet.zip，这个也不清楚运行结果是否正确？请问官方能否也像 ssd 类，yolo类，给个anchor free 类的后处理demo，谢谢

zhouweic36 commented 2 years ago

PicoDet技术交流群二维码更新：

你好，还有更新的二维码不？？

PaddlePaddle / PaddleDetection

🌟 PP-PicoDet已发布，欢迎大家试用&讨论 #4420

postprocess output of predictor