cocoa-xu / evision

Evision: An OpenCV-Erlang/Elixir binding
https://evision.app
Apache License 2.0
322 stars 22 forks source link

Evision.guassianBlur limiting dims greater than 2 #248

Closed davydog187 closed 1 week ago

davydog187 commented 1 month ago

I tried to reproduce the issue below, we are able to run the same code via python/numpy and it works as expected. After chatting with @polvalente he seems to think its a bug in Evision.

Mix.install([
  {:evision, "~> 0.2.1"},
  {:nx, "~> 0.7.2"}
])

Gaussian Blur

v = Nx.broadcast(Nx.tensor(1.0, type: :f64), {1200, 1920, 3})

# Works if you just use width and height dimensions
Evision.gaussianBlur(v[[.., .., 0]], {31, 31}, 0)
%Evision.Mat{
  channels: 1,
  dims: 2,
  type: {:f, 64},
  raw_type: 6,
  shape: {1200, 1920},
  ref: #Reference<0.555854501.2917531664.244421>
}
v = Nx.broadcast(Nx.tensor(1.0, type: :f64), {1200, 1920, 3})

# Doesn't work with color channels
Evision.gaussianBlur(v, {31, 31}, 0)
{:error,
 "OpenCV(4.10.0) /Users/runner/work/evision/evision/3rd_party/opencv/opencv-4.10.0/modules/core/src/matrix.cpp:1099: error: (-215:Assertion failed) dims <= 2 && step[0] > 0 in function 'locateROI'\n"}
cocoa-xu commented 1 month ago

Hi this is a known limitation in OpenCV as how it was designed specifically for 2D images. The solution is to use Evision.Mat.from_nx_2d/1 to convert the last dimension as channels and it will work.

v = Nx.broadcast(Nx.tensor(1.0, type: :f64), {1200, 1920, 3})
Evision.gaussianBlur(Evision.Mat.from_nx_2d(v), {31, 31}, 0)
%Evision.Mat{
  channels: 3,
  dims: 2,
  type: {:f, 64},
  raw_type: 6,
  shape: {1200, 1920},
  ref: #Reference<0.555854501.2917531664.244421>
}

And here are some more context for this, https://github.com/cocoa-xu/evision/wiki/Integration-with-Nx

iex> %Evision.Mat{} = mat = Evision.imread("/path/to/image.png")
iex> t = Evision.Mat.to_nx(mat)
# convert a tensor to a mat
iex> mat_from_tensor = Evision.Mat.from_nx(t)
%Evision.Mat{
  channels: 1,
  dims: 3,
  type: {:u, 8},
  raw_type: 0,
  shape: {512, 512, 3},
  ref: #Reference<0.1086574232.1510342676.18186>
}

# Note that `Evision.Mat.from_nx` gives a tensor
# however, some OpenCV functions expect the mat
# to be a "valid 2D image"
# therefore, in such cases `Evision.Mat.from_nx_2d`
# should be used instead
#
# Noticing the changes in `channels`, `dims` and `raw_type`
iex> mat_from_tensor = Evision.Mat.from_nx_2d(t)
%Evision.Mat{
  channels: 3,
  dims: 2,
  type: {:u, 8},
  raw_type: 16,
  shape: {512, 512, 3},
  ref: #Reference<0.1086574232.1510342676.18187>
}

# and it works for tensors with any shapes
iex> t = Nx.iota({2, 3, 2, 3, 2, 3}, type: :s32)
iex> mat = Evision.Mat.from_nx(t)
%Evision.Mat{
  channels: 1,
  dims: 6,
  type: {:s, 32},
  raw_type: 4,
  shape: {2, 3, 2, 3, 2, 3},
  ref: #Reference<0.1086574232.1510342676.18188>
}
davydog187 commented 1 month ago

Thanks for the extra context, @cocoa-xu

After reading through your example and the linked wiki, Im still unsure why Evision can't automatically cast the tensor into the right shape in this situation. Numpy and OpenCV can handle this seamlessly which yields intuitive code

m = np.zeros((1200, 1920, 3), dtype="uint8") r = cv.GaussianBlur(m, (3, 3), 0) r.shape (1200, 1920, 3) same thing

It's unclear to me why we can't achieve the same thing with Evision without knowing to call Evision.Mat.from_nx_2d

cocoa-xu commented 1 month ago

Because in Python there's no representation of cv::Mat -- they're basically a wrapped class (cv::UMat) around a numpy object instead.

We surely can call these functions implicitly when the input args are Nx.t() and when the outputs are Evision.Mat

Yet the catch is, in opencv-python, the memory is shared between the wrapped class and numpy while in Erlang we cannot give Erlang an ErlNifBinary with mutable data in it, which would fundamentally break the immutability of Erlang (actually it's more likely to crash the Erlang process).

Therefore, when convert to a Nx.t() (with native backend), we have to make a copy of the data, and it's the same vice-versa. If we want to achieve the same thing like opencv-python and numpy, then we have to make Evision as a backend of Nx, which is indeed one thing on the roadmap. (You can use Evision.Backend as the backend of an Nx tensor)

However, unlike PyTorch or XLA, cv::Mat in OpenCV does not support some required Nx callbacks out of the box, so we have to write them from scratch. Sadly I don't have enough time to test and optimise them (maybe we can copy some from PyTorch or XLA).

Besides that, even if we can use Evision.Backend for Nx, OpenCV's cv::Mat doesn't support :s64, :u32 and :u64 types, yet they're very commonly seen in today's ML/Deep Learning/AI workflows, plus that Nx by default would choose :s64 when initialise a tensor. A workaround is described in the Wiki page but it's far from perfect.

Furthermore, due to the limitation in OpenCV, it's hard to patch the source code to support these types in this project, https://github.com/cocoa-xu/evision/issues/48#issuecomment-1266282345. It would basically require us to rewrite the cv::Mat class.

davydog187 commented 4 weeks ago

Thanks for the detailed response @cocoa-xu!

We surely can call these functions implicitly when the input args are Nx.t() and when the outputs are Evision.Mat

Right this is the the behavior that would be most intuitive

Yet the catch is, in opencv-python, the memory is shared between the wrapped class and numpy while in Erlang we cannot give Erlang an ErlNifBinary with mutable data in it, which would fundamentally break the immutability of Erlang (actually it's more likely to crash the Erlang process) ...

After re-reading this a few times, I'm unclear on what conclusion this leads us. If we call the functions to mimic the behavior of Evision.Mat.from_nx_2d, are you suggesting that we would be breaking the semantics of Erlang? Couldn't Evision literally do that in its generated glue code, or does it lower into the NIF in some fundamental way that I'm not understanding?

cocoa-xu commented 1 week ago

Closing this as it should be addressed in #251 ;)