Jittor / jittor

Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.
https://cg.cs.tsinghua.edu.cn/jittor/
Apache License 2.0
3.08k stars 311 forks source link

jittor.transform on numpy images #337

Open BrianPugh opened 2 years ago

BrianPugh commented 2 years ago

Describe the bug

Numpy (H, W, 3) images fed into transform.ImageNormalize don't work as expected. Image must be transposed first. Common if using cv2.imread instead of PIL.Image.

Full Log

Traceback (most recent call last):
  File "/Users/brianpugh/projects/scratch.py", line 12, in <module>
    jt_image = transform_image(image)
  File "/Users/brianpugh/.pyenv/versions/3.9.9/lib/python3.9/site-packages/jittor/transform/__init__.py", line 675, in __call__
    data = t(data)
  File "/Users/brianpugh/.pyenv/versions/3.9.9/lib/python3.9/site-packages/jittor/transform/__init__.py", line 651, in __call__
    img = (img - self.mean) / self.std
ValueError: operands could not be broadcast together with shapes (480,640,3) (3,1,1)

Minimal Reproduce

import numpy as np
from jittor import transform

transform_image = transform.Compose([
        transform.ToTensor(),
        transform.ImageNormalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
    ])

# Standard (H, W, 3) RGB image
image = np.random.randint(0, 255, size=(480, 640, 3), dtype=np.uint8)

# This doesn't work
# ValueError: operands could not be broadcast together with shapes (480,640,3) (3,1,1
jt_image = transform_image(image)

# This works
jt_image = transform_image(image.transpose(2,0,1))

Expected behavior

I'd expect numpy images of standard (H, W, 3) shape/dimensionality to work. It seems like PIL.Image are explicitly transposed.

BrianPugh commented 2 years ago

so maybe this is actually a misunderstanding of ToTensor

https://github.com/Jittor/jittor/blob/9e58fac49f3ebed8419fdb7c3faf5355f5f615b6/python/jittor/transform/__init__.py#L398

I would have expected it to mirror pytorch's ToTensor and convert PIL/numpy data into a jittor tensor, but it seems like it doesn't do this. What is the expected work flow then for preprocessing data?