deepjavalibrary / djl

An Engine-Agnostic Deep Learning Framework in Java
https://djl.ai
Apache License 2.0
4.07k stars 648 forks source link

The prediction results between djl and pytorch are different on the same model #1116

Closed mengpengfei closed 3 years ago

mengpengfei commented 3 years ago

The inference result of djl is different from The inference result of pytorch。I had check all my code,but I still can not find the problem,please help me.blow is the code of pytorch and the code of djl

image image

frankfliu commented 3 years ago

@mengpengfei Please see: https://github.com/deepjavalibrary/djl/issues/970

DJL resize use PyTorch interpolate C++ API. The implementation is slightly different from torchvision's resize.

mengpengfei commented 3 years ago

I use opencv to resize the picture,then the problem is solved. Can you make djl compatible with OpenCV?

frankfliu commented 3 years ago

@mengpengfei It's hard to choose which implementation to use, there are many different implementations:

  1. Java ImageIO
  2. OpenCV
  3. PIL
  4. PyTorch interpolate C++ API
  5. MXNet (different flavor of OpenCV)

None of above produce the exactly the same result. In ideal case, the resize algorithm should match the model was trained on. For DJL, choosing OpenCV mean break other models.

We made a few assumption:

  1. Assume user will use native engine's resize algorithm (MXNet user use MXNet resize to train the model)
  2. PyTorch is working on moving python based resize into there C++ API, once this done, python and C++ API will have the same behavior (There is an issue on pytorch github)
  3. In most cases, the different won't impact the final inference result (the accuracy impact is small)