cocoa-xu / evision

Evision: An OpenCV-Erlang/Elixir binding
https://evision.app
Apache License 2.0
337 stars 22 forks source link

Evision.Mat.transpose/{1, 2} calls shape/1 not shape!/1; returns invalid shape #77

Closed kipcole9 closed 2 years ago

kipcole9 commented 2 years ago

When calling Evision.Mat.transpose/{1,2} I think there are three issues:

  1. The call to shape/1 on line 229 should be shape!/1
  2. The call to shape/1 on line 269 should be shape!(mat) |> Tuple.to_list()
  3. The return result from transpose/{1, 2} returns an invalid shape even after these two issues are corrected. This results in downstream operations like Evision.imwrite/2 failing.

Example

# Create a tensor
iex(1)> tensor = Nx.tensor  [                                           
...(1)>     [
...(1)>       [55, 165, 0],
...(1)>       [55, 165, 0],
...(1)>       [55, 165, 0]
...(1)>     ],
...(1)>     [
...(1)>       [75, 165, 0],
...(1)>       [75, 165, 0],
...(1)>       [75, 165, 0]
...(1)>     ],
...(1)>     [
...(1)>       [155, 165, 0],
...(1)>       [155, 165, 0],
...(1)>       [155, 165, 0]
...(1)>     ],
...(1)>     [
...(1)>       [255, 165, 0],
...(1)>       [255, 165, 0],
...(1)>       [255, 165, 0]
...(1)>     ]
...(1)>   ], type: {:u, 8}
#Nx.Tensor<
  u8[4][3][3]
  ...
>

# Convert to an Evision mat
iex> {:ok, mat} = Evision.Nx.to_mat(tensor)
{:ok, #Reference<0.2283891390.1959395348.49054>}

# Check the shape is the same as the original tensor
iex> Evision.Mat.shape(mat)                 
{:ok, {4, 3, 3}}

# Transpose the axis. This will be the case when converting
# tensors that come from `Image` because we need to transpose
# from {width, height, bands} to {height, width, bands}
iex> {:ok, transposed} = Evision.Mat.transpose(mat, [1, 0, 2])
{:ok, #Reference<0.2283891390.1959395348.49056>}

# Now the shape has too many dimensions!
# The first three tuple elements appear to be the correct
# ones. Perhaps the NIF function has an "off by one" issue?
iex> Evision.Mat.shape(transposed)                     
{:ok, {3, 4, 3, 3}}
cocoa-xu commented 2 years ago

Hi @kipcole9, I see the problem here. It wasn't actually an off-by-one error. It is because of OpenCV's inconsistent way of storing the dims of a Mat.

A 4x3x3 mat could be initialised in two ways:

Both of them will give the same result on the surface, however, mat.size.dims from the first one will give [4,3,3] while the second one will be [4,3], and the left dims info, the last 3, will be stored in channels. (And as you may have guessed, the channel info of the first one is also 3, causing the {3, 4, 3, 3} shape)

So the fix in Evision's code is always using from_binary_by_shape when converting a tensor to a mat, and pass the type as CV_8U/CV_32F/etc, so that we can check mat.type() to know if we should include the channel number as the last dim in the shape later.

kipcole9 commented 2 years ago

Thank you very much @cocoa-xu. I find it difficult to get the mental model of OpenCV straight in my head and its been many many years since I did anything in C/C++ so I find the API docs difficult to decode. So I worry that I will bother you with stupid questions. I'm just happy this one wasn't too stupid. And I really appreciate the fix.

I have tensors/matrices round tripping between Image and OpenCV working well. But I haven't been able to get the image to be correctly resolved in eVision. Hopefully its mostly the issue you outlined above! I'll report back if/when I get that resolved. Then I'm going to add a simple API wrapping the QRCodeDetector module as a first step.

cocoa-xu commented 2 years ago

Hi @kipcole9, please don't worry about that. I'm always happy to help! This is also a learning process for me. I guess we could say that OpenCV is a large project, and some of its "weird" designs may be due to historical reasons.

Please let me know if there is anything I can do in this project to help you! (I'll be looking at #76 this week. It's not hard to do but will introduce breaking changes)