deepjavalibrary / djl

An Engine-Agnostic Deep Learning Framework in Java
https://djl.ai
Apache License 2.0
4.07k stars 648 forks source link

Errors when using yolov5 translate. #1211

Closed CensorKo closed 2 years ago

CensorKo commented 3 years ago

Description

First, I export my model from yolov5.

export command: python export.py --weights ./runs/train/exp26/weights/best.pt --img 640 --batch 1 --include torchscript

And then my djl code is: Translator<Image, DetectedObjects> translator = YoloV5Translator.builder().optSynsetArtifactName("coco.names").build(); Criteria<Image, DetectedObjects> criteria = Criteria.builder() .setTypes(Image.class, DetectedObjects.class) .optDevice(Device.cpu()) .optModelUrls(Main.class.getResource("/yolov5s").getPath()) .optModelName("best.torchscript.pt") .optTranslator(translator) .optEngine("PyTorch") .build();

when execute: ZooModel<Image, DetectedObjects> model = ModelZoo.loadModel(criteria); Got errors.

Error Message

[W TensorImpl.h:1156] Warning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (function operator()) ai.djl.translate.TranslateException: ai.djl.engine.EngineException: The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File "code/torch/models/yolo.py", line 46, in forward _35 = (_4).forward(_34, ) _36 = (_2).forward((_3).forward(_35, ), _29, ) _37 = (_0).forward(_33, _35, (_1).forward(_36, ), )


    _38, _39, _40, _41, = _37
    return (_41, [_38, _39, _40])
  File "code/__torch__/models/yolo.py", line 75, in forward
    y = torch.sigmoid(_50)
    _51 = torch.mul(torch.slice(y, 4, 0, 2), CONSTANTS.c0)
    _52 = torch.add(torch.sub(_51, CONSTANTS.c1), CONSTANTS.c2)
          ~~~~~~~~~ <--- HERE
    xy = torch.mul(_52, torch.select(CONSTANTS.c3, 0, 0))
    _53 = torch.mul(torch.slice(y, 4, 2, 4), CONSTANTS.c4)

Traceback of TorchScript, original code (most recent call last):
/data1/yolov5_aws/yolov5/models/yolo.py(66): forward
/root/anaconda3/envs/yolov5/lib/python3.7/site-packages/torch/nn/modules/module.py(1039): _slow_forward
/root/anaconda3/envs/yolov5/lib/python3.7/site-packages/torch/nn/modules/module.py(1051): _call_impl
/data1/yolov5_aws/yolov5/models/yolo.py(155): forward_once
/data1/yolov5_aws/yolov5/models/yolo.py(123): forward
/root/anaconda3/envs/yolov5/lib/python3.7/site-packages/torch/nn/modules/module.py(1039): _slow_forward
/root/anaconda3/envs/yolov5/lib/python3.7/site-packages/torch/nn/modules/module.py(1051): _call_impl
/root/anaconda3/envs/yolov5/lib/python3.7/site-packages/torch/jit/_trace.py(959): trace_module
/root/anaconda3/envs/yolov5/lib/python3.7/site-packages/torch/jit/_trace.py(744): trace
export.py(35): export_torchscript
export.py(154): run
export.py(187): main
export.py(192): <module>
RuntimeError: The size of tensor a (180) must match the size of tensor b (80) at non-singleton dimension 3

 at ai.djl.inference.Predictor.batchPredict(Predictor.java:170)
 at ai.djl.inference.Predictor.predict(Predictor.java:118)
 at xyz.hyhy.Main.main(Main.java:46)
Caused by: ai.djl.engine.EngineException: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/models/yolo.py", line 46, in forward
    _35 = (_4).forward(_34, )
    _36 = (_2).forward((_3).forward(_35, ), _29, )
    _37 = (_0).forward(_33, _35, (_1).forward(_36, ), )
           ~~~~~~~~~~~ <--- HERE
    _38, _39, _40, _41, = _37
    return (_41, [_38, _39, _40])
  File "code/__torch__/models/yolo.py", line 75, in forward
    y = torch.sigmoid(_50)
    _51 = torch.mul(torch.slice(y, 4, 0, 2), CONSTANTS.c0)
    _52 = torch.add(torch.sub(_51, CONSTANTS.c1), CONSTANTS.c2)
          ~~~~~~~~~ <--- HERE
    xy = torch.mul(_52, torch.select(CONSTANTS.c3, 0, 0))
    _53 = torch.mul(torch.slice(y, 4, 2, 4), CONSTANTS.c4)

Traceback of TorchScript, original code (most recent call last):
/data1/yolov5_aws/yolov5/models/yolo.py(66): forward
/root/anaconda3/envs/yolov5/lib/python3.7/site-packages/torch/nn/modules/module.py(1039): _slow_forward
/root/anaconda3/envs/yolov5/lib/python3.7/site-packages/torch/nn/modules/module.py(1051): _call_impl
/data1/yolov5_aws/yolov5/models/yolo.py(155): forward_once
/data1/yolov5_aws/yolov5/models/yolo.py(123): forward
/root/anaconda3/envs/yolov5/lib/python3.7/site-packages/torch/nn/modules/module.py(1039): _slow_forward
/root/anaconda3/envs/yolov5/lib/python3.7/site-packages/torch/nn/modules/module.py(1051): _call_impl
/root/anaconda3/envs/yolov5/lib/python3.7/site-packages/torch/jit/_trace.py(959): trace_module
/root/anaconda3/envs/yolov5/lib/python3.7/site-packages/torch/jit/_trace.py(744): trace
export.py(35): export_torchscript
export.py(154): run
export.py(187): main
export.py(192): <module>
RuntimeError: The size of tensor a (180) must match the size of tensor b (80) at non-singleton dimension 3

 at ai.djl.pytorch.jni.PyTorchLibrary.moduleForward(Native Method)
 at ai.djl.pytorch.jni.IValueUtils.forward(IValueUtils.java:46)
 at ai.djl.pytorch.engine.PtSymbolBlock.forwardInternal(PtSymbolBlock.java:126)
 at ai.djl.nn.AbstractBlock.forward(AbstractBlock.java:126)
 at ai.djl.nn.Block.forward(Block.java:122)
 at ai.djl.inference.Predictor.predict(Predictor.java:123)
 at ai.djl.inference.Predictor.batchPredict(Predictor.java:163)
 ... 2 more
Disconnected from the target VM, address: '127.0.0.1:56512', transport: 'socket'
frankfliu commented 3 years ago

Can you try use python to run your torch script model? What's the expected input shapes?

chengpengvb commented 2 years ago

Build like this: Pipeline pipeline = new Pipeline(); pipeline.add(new Resize(640, 640)); pipeline.add(new ToTensor());

        Translator<Image, DetectedObjects> translator = YoloV5Translator.builder().setPipeline(pipeline)
                .optSynsetArtifactName("coco.names").optThreshold(0.5f).build();

new Resize(640, 640) Set and export size the same