Document image format expectations for Bumblebee.Vision.ImageClassification

kipcole9 commented 1 year ago

I would like to contribute some documentation that clarifies the expected image format to Bumblebee.Vision.image_classification. The type t:Bumblebee.Vision.image says:

@type image() :: Nx.Container.t() A term representing an image. Either Nx.Tensor in HWC order or a struct implementing Nx.Container and resolving to such tensor.

However it does not clarify:

If the image should be resized first to the same size as that used to train the model (224 x 224 for the resnet models?)
Whether the image data should be {:u, 8} or some other type (some models suggest data should be in the range [0.0..1.0]
Whether the image can have an alpha layer (reading the code suggests yes, but perhaps that is model dependent)
Whether the image should be preprocessed? This stack overflow article suggests they should be?

If I can get some guidance I'll write a doc PR.

jonatanklosko commented 1 year ago

Hey Kip! The image doesn't need to be particularly normalized, because it first goes through a featurizer. In other words, it's not the direct model input, but plain image as pixels. In fact, the type is Nx.Container.t(), because it may also be a struct that implements Nx.Container, which we already do for StbImage (ref).

A featurizer usually casts to float, resizes, scales into [0.0, 1.0]. Whether an alpha layer is used is usually up to the model configuration. So I think the only generally applicable expectation is that the image values are 0..255.

A PR improving the docs is welcome!

kipcole9 commented 1 year ago

@jonatanklosko, TIL what a featurizer is! I suppose the assumption is also the the channel order is RGB (not BGR). I'll work on a doc PR this weekend. For validation then, the input image has the following assumed characteristics:

HWC order
RGB color (channel order, not CMYK or some other color space)
Alpha channel support is model specific
{:u, 8} or {:f, 32} or {:f, 64} data type

Thanks for the continuing education and the great library.

jonatanklosko commented 1 year ago

The type is not as strict, pretty much any :u or :s type would do. Other than that sounds good!

elixir-nx / bumblebee

Document image format expectations for Bumblebee.Vision.ImageClassification #103