elixir-nx / bumblebee

Pre-trained Neural Network models in Axon (+ 🤗 Models integration)
Apache License 2.0
1.33k stars 95 forks source link

Featurizer different from python? #323

Closed sonic182 closed 8 months ago

sonic182 commented 8 months ago

Hi, I have a small issue that may be an dummy error.

I'm trying to use a featurizer in elixir (Blip2Processor in python transformers), then I'm going to send the output of the featurizer (the model input) via http to a python service.

This is the example code I'm using:

{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "Salesforce/blip2-flan-t5-xl"})

#  image package: {:image, "~> 0.40.0"} and msgpax later on
tensor = Image.to_nx!(img)

inputs = Bumblebee.apply_featurizer(featurizer, tensor)

data = inputs["pixel_values"] |> Nx.to_list() |> Msgpax.pack!(iodata: false)

encoded = Base.encode64(data)

url = "http://localhost:8080/invocations"
data = %{tensors: encoded}
Req.post!(url, json: data)

It seems that the shape of the received tensor is different from the expected in the pytorch model

In pytorch, the shape when generating the input is: torch.Size([1, 3, 224, 224]), In elixir it seems to be [1, 224, 224, 3] ->

image

How can I correct the shape? Thanks!

sonic182 commented 8 months ago

When I do

data =
  inputs["pixel_values"]
  |> Nx.reshape({1, 3, 224, 224})
  |> Nx.to_list()
  |> Msgpax.pack!(iodata: false)

data
...

It works, but the image captioning is wrong, the reshape is not that magical I guess

jonatanklosko commented 8 months ago

Hey, the Python expects tensor axes to be CHW, while we compute HWC (to avoid unnecessary transpositions later). You can try this:

data =
  inputs["pixel_values"]
  |> Nx.transpose(axes: [0, 2, 3, 1])
  |> Nx.to_list()
  |> Msgpax.pack!(iodata: false)
sonic182 commented 8 months ago

Thanks!! it worked with:

|> Nx.transpose(axes: [0, 3, 2, 1])

sonic182 commented 8 months ago

Just a small question @jonatanklosko

There is any similar stuff to do the Image.open("whatever.jpg").convert("RGB") rgb convert from pillow in elixir?

The BlipImageProcessor in transformers has an option to do this "do_convert_rgb" https://huggingface.co/docs/transformers/main/en/model_doc/blip#transformers.BlipImageProcessor, these models needs the RGB conversion to work propertly.

I have checked image package, and STB but I haven't found any similar stuff

jonatanklosko commented 8 months ago

What is the input/output you are looking for? StbImage reads the image into RGB or RGBA representation and the Bumblebee featurizer will automatically strip the alpha channel if there is one, so it is normalized into RGB.

sonic182 commented 8 months ago

I'm just going to use StbImage, it works... I wanted to use Image package but I think it does not the RGB normalization automatically

jonatanklosko commented 8 months ago

Ah I see, yeah Image delegates to Vix.Vips.Image.write_to_tensor/1, which as far as I understand can return any channel representation, such as CMYK.

sonic182 commented 8 months ago

Hi @jonatanklosko

I want to show you this example:

defmodule Whatever do

  def images_to_input(img_links, featurizer) do
    img_links
    # featurizer.size is {h, w} tuple
    |> Enum.reduce([], &download_and_prepare_img(&1, &2, featurizer.size))
    |> then(fn images ->
      # required for more transformations, like normalization, ...
      Bumblebee.apply_featurizer(featurizer, images)
    end)
  end

  # by doing resize with Image package, we avoid using image_nx for this step
  defp prepare_image(img_bin, {h, w}) do
    size = "#{h}x#{w}"
    img = Image.from_binary!(img_bin)
    colorspace = Image.colorspace(img)

    img
    |> Image.thumbnail!(size, fit: :fill)
    |> then(fn img ->
      if colorspace not in [:rgb, :srgb] do
        Image.to_colorspace!(img, :rgb)
      else
        img
      end
    end)
    |> Image.split_alpha()
    |> then(fn {image, _alpha} ->
      Image.to_nx!(image)
    end)
  end
end

I did use the image package for doing resize before get into the featurizer, that did speed up my workload x2 (all in CPU, not gpu).

Would be nice to have the images featurizers using libvips instead of nx_image, maybe as an alternative backend

sonic182 commented 8 months ago

Well, it is not as fast as using Nx directly.

I managed to do all transformations with Vix:

  def prepare_image(img_bin, %{size: {h, w}} = featurizer) do
    size = "#{h}x#{w}"
    img = Image.from_binary!(img_bin)
    colorspace = Image.colorspace(img)

    img
    |> Image.thumbnail!(size, fit: :fill)
    |> Image.split_alpha()
    |> then(fn {image, _alpha} -> image end)
    |> Image.Math.divide!(255.0)
    |> Image.Math.subtract!(featurizer.image_mean)
    |> Image.Math.divide!(featurizer.image_std)
    |> then(fn img ->
      if colorspace not in [:rgb, :srgb] do
        Image.to_colorspace!(img, :rgb)
      else
        img
      end
    end)
    |> Image.to_nx!()
  end

And it is a bit slower than previous example, the key difference was to do resize before all other operations

jonatanklosko commented 8 months ago

Hey, resizing the image upfront is a valid thing to do, so if you have it loaded in Vips/StbImage and resizing makes things faster without reducing model output quality, it's totally fine. If the image is coming from the browser directly, you could even resize it on the client side to a much smaller one, that's what we do in the LV example. In the featurizer itself I think we should keep the order of operations as the reference implementation for consistency, and I think we shouldn't convert back and forth just for the resize.