elixir-nx / bumblebee

Pre-trained Neural Network models in Axon (+ 🤗 Models integration)
Apache License 2.0
1.33k stars 96 forks source link

Document image format expectations for Bumblebee.Vision.ImageClassification #103

Closed kipcole9 closed 1 year ago

kipcole9 commented 1 year ago

I would like to contribute some documentation that clarifies the expected image format to Bumblebee.Vision.image_classification. The type t:Bumblebee.Vision.image says:

@type image() :: Nx.Container.t() A term representing an image. Either Nx.Tensor in HWC order or a struct implementing Nx.Container and resolving to such tensor.

However it does not clarify:

If I can get some guidance I'll write a doc PR.

jonatanklosko commented 1 year ago

Hey Kip! The image doesn't need to be particularly normalized, because it first goes through a featurizer. In other words, it's not the direct model input, but plain image as pixels. In fact, the type is Nx.Container.t(), because it may also be a struct that implements Nx.Container, which we already do for StbImage (ref).

A featurizer usually casts to float, resizes, scales into [0.0, 1.0]. Whether an alpha layer is used is usually up to the model configuration. So I think the only generally applicable expectation is that the image values are 0..255.

A PR improving the docs is welcome!

kipcole9 commented 1 year ago

@jonatanklosko, TIL what a featurizer is! I suppose the assumption is also the the channel order is RGB (not BGR). I'll work on a doc PR this weekend. For validation then, the input image has the following assumed characteristics:

Thanks for the continuing education and the great library.

jonatanklosko commented 1 year ago

The type is not as strict, pretty much any :u or :s type would do. Other than that sounds good!