deepjavalibrary / djl

An Engine-Agnostic Deep Learning Framework in Java
https://djl.ai
Apache License 2.0
4.16k stars 661 forks source link

使用Yolov5 export.py 导出的torchscript模型无法识别图片 #2445

Closed tapiohuang closed 1 year ago

tapiohuang commented 1 year ago

Description

我使用yolov5中export.py导出训练好的模型,这个模型在python上运行良好,但是导出torchscript后在djl中使用却识别不出结果

References

生成ZooModel的代码

        Map<String, Object> arguments = new ConcurrentHashMap<>();
        arguments.put("width", 640);
        arguments.put("height", 640);
        arguments.put("resize", true);
        arguments.put("rescale", true);
        Translator<Image, DetectedObjects> translator = YoloV5Translator.builder()
                .optSynsetArtifactName(namesFile)
                .build();
        return Criteria.builder()
                .optApplication(Application.CV.INSTANCE_SEGMENTATION)
                .setTypes(Image.class, DetectedObjects.class)
                .optDevice(Device.cpu())
                .optModelPath(Paths.get(path))
                .optModelName(modelName)
                .optTranslator(translator)
                .optProgress(new ProgressBar())
                .optEngine("PyTorch")
                .build();

export.py的参数

parser = argparse.ArgumentParser()
    parser.add_argument('--data', type=str, default=ROOT / 'data/h_steel_label.yaml', help='dataset.yaml path')
    parser.add_argument('--weights', nargs='+', type=str, default=ROOT / 'runs/train/h_steel_label2/weights/best.pt', help='model.pt path(s)')
    parser.add_argument('--imgsz', '--img', '--img-size', nargs='+', type=int, default=[640, 640], help='image (h, w)')
    parser.add_argument('--batch-size', type=int, default=1, help='batch size')
    parser.add_argument('--device', default='cpu', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
    parser.add_argument('--half', action='store_true', help='FP16 half-precision export')
    parser.add_argument('--inplace', action='store_true', help='set YOLOv5 Detect() inplace=True')
    parser.add_argument('--keras', action='store_true', help='TF: use Keras')
    parser.add_argument('--optimize', action='store_true', help='TorchScript: optimize for mobile')
    parser.add_argument('--int8', action='store_true', help='CoreML/TF INT8 quantization')
    parser.add_argument('--dynamic', action='store_true', help='ONNX/TF/TensorRT: dynamic axes')
    parser.add_argument('--simplify', action='store_true', help='ONNX: simplify model')
    parser.add_argument('--opset', type=int, default=17, help='ONNX: opset version')
    parser.add_argument('--verbose', action='store_true', help='TensorRT: verbose log')
    parser.add_argument('--workspace', type=int, default=4, help='TensorRT: workspace size (GB)')
    parser.add_argument('--nms', action='store_true', help='TF: add NMS to model')
    parser.add_argument('--agnostic-nms', action='store_true', help='TF: add agnostic NMS to model')
    parser.add_argument('--topk-per-class', type=int, default=100, help='TF.js NMS: topk per class to keep')
    parser.add_argument('--topk-all', type=int, default=100, help='TF.js NMS: topk for all classes to keep')
    parser.add_argument('--iou-thres', type=float, default=0.45, help='TF.js NMS: IoU threshold')
    parser.add_argument('--conf-thres', type=float, default=0.25, help='TF.js NMS: confidence threshold')
frankfliu commented 1 year ago

@tapiohuang

We have yolov5s model in our pytorch model zoo, however it's a object detection model, not for instance segmentation.

Criteria criteria = Criteria.builder()
                .setTypes(Image.class, DetectedObjects.class)
                .optModelUrls("djl://ai.djl.pytorch/yolo5s")
                .optEngine("PyTorch")
                .build()
KexinFeng commented 1 year ago

Generally, DJL in Java and python API share the same engine, ie PyTorch. So in principle, the inference with a model should behave the same between DJL and python. If not, it must be a bug. Maybe you can debug parallelly by comparing the execution between DJL and python, before they call the PyTorch cpp API? Or, as Frank mentioned above, you can debug with the help of yolov5s. Or if your model file can be shared, you can also provide a more detailed code, which we can see if we can find the bug.

tapiohuang commented 1 year ago

问题已经解决。问题出现在resize上,在python的yolo5中,对图片的缩放是等比例缩放的,但是在DJL中resize仅仅是将宽高简单的拉伸或缩小,这就导致了无法识别。我将图片等比例缩放后已经可以正常工作。