Closed sonic182 closed 8 months ago
When I do
data =
inputs["pixel_values"]
|> Nx.reshape({1, 3, 224, 224})
|> Nx.to_list()
|> Msgpax.pack!(iodata: false)
data
...
It works, but the image captioning is wrong, the reshape is not that magical I guess
Hey, the Python expects tensor axes to be CHW, while we compute HWC (to avoid unnecessary transpositions later). You can try this:
data =
inputs["pixel_values"]
|> Nx.transpose(axes: [0, 2, 3, 1])
|> Nx.to_list()
|> Msgpax.pack!(iodata: false)
Thanks!! it worked with:
|> Nx.transpose(axes: [0, 3, 2, 1])
Just a small question @jonatanklosko
There is any similar stuff to do the Image.open("whatever.jpg").convert("RGB")
rgb convert from pillow in elixir?
The BlipImageProcessor in transformers has an option to do this "do_convert_rgb" https://huggingface.co/docs/transformers/main/en/model_doc/blip#transformers.BlipImageProcessor, these models needs the RGB conversion to work propertly.
I have checked image package, and STB but I haven't found any similar stuff
What is the input/output you are looking for? StbImage reads the image into RGB or RGBA representation and the Bumblebee featurizer will automatically strip the alpha channel if there is one, so it is normalized into RGB.
I'm just going to use StbImage, it works... I wanted to use Image package but I think it does not the RGB normalization automatically
Ah I see, yeah Image
delegates to Vix.Vips.Image.write_to_tensor/1
, which as far as I understand can return any channel representation, such as CMYK.
Hi @jonatanklosko
I want to show you this example:
defmodule Whatever do
def images_to_input(img_links, featurizer) do
img_links
# featurizer.size is {h, w} tuple
|> Enum.reduce([], &download_and_prepare_img(&1, &2, featurizer.size))
|> then(fn images ->
# required for more transformations, like normalization, ...
Bumblebee.apply_featurizer(featurizer, images)
end)
end
# by doing resize with Image package, we avoid using image_nx for this step
defp prepare_image(img_bin, {h, w}) do
size = "#{h}x#{w}"
img = Image.from_binary!(img_bin)
colorspace = Image.colorspace(img)
img
|> Image.thumbnail!(size, fit: :fill)
|> then(fn img ->
if colorspace not in [:rgb, :srgb] do
Image.to_colorspace!(img, :rgb)
else
img
end
end)
|> Image.split_alpha()
|> then(fn {image, _alpha} ->
Image.to_nx!(image)
end)
end
end
I did use the image package for doing resize before get into the featurizer, that did speed up my workload x2 (all in CPU, not gpu).
Would be nice to have the images featurizers using libvips instead of nx_image, maybe as an alternative backend
Well, it is not as fast as using Nx directly.
I managed to do all transformations with Vix:
def prepare_image(img_bin, %{size: {h, w}} = featurizer) do
size = "#{h}x#{w}"
img = Image.from_binary!(img_bin)
colorspace = Image.colorspace(img)
img
|> Image.thumbnail!(size, fit: :fill)
|> Image.split_alpha()
|> then(fn {image, _alpha} -> image end)
|> Image.Math.divide!(255.0)
|> Image.Math.subtract!(featurizer.image_mean)
|> Image.Math.divide!(featurizer.image_std)
|> then(fn img ->
if colorspace not in [:rgb, :srgb] do
Image.to_colorspace!(img, :rgb)
else
img
end
end)
|> Image.to_nx!()
end
And it is a bit slower than previous example, the key difference was to do resize before all other operations
Hey, resizing the image upfront is a valid thing to do, so if you have it loaded in Vips/StbImage and resizing makes things faster without reducing model output quality, it's totally fine. If the image is coming from the browser directly, you could even resize it on the client side to a much smaller one, that's what we do in the LV example. In the featurizer itself I think we should keep the order of operations as the reference implementation for consistency, and I think we shouldn't convert back and forth just for the resize.
Hi, I have a small issue that may be an dummy error.
I'm trying to use a featurizer in elixir (Blip2Processor in python transformers), then I'm going to send the output of the featurizer (the model input) via http to a python service.
This is the example code I'm using:
It seems that the shape of the received tensor is different from the expected in the pytorch model
In pytorch, the shape when generating the input is:
torch.Size([1, 3, 224, 224])
, In elixir it seems to be [1, 224, 224, 3] ->How can I correct the shape? Thanks!