Closed kipcole9 closed 1 year ago
Hey Kip! The image doesn't need to be particularly normalized, because it first goes through a featurizer. In other words, it's not the direct model input, but plain image as pixels. In fact, the type is Nx.Container.t()
, because it may also be a struct that implements Nx.Container
, which we already do for StbImage (ref).
A featurizer usually casts to float, resizes, scales into [0.0, 1.0]. Whether an alpha layer is used is usually up to the model configuration. So I think the only generally applicable expectation is that the image values are 0..255.
A PR improving the docs is welcome!
@jonatanklosko, TIL what a featurizer is! I suppose the assumption is also the the channel order is RGB (not BGR). I'll work on a doc PR this weekend. For validation then, the input image has the following assumed characteristics:
{:u, 8}
or {:f, 32}
or {:f, 64}
data typeThanks for the continuing education and the great library.
The type is not as strict, pretty much any :u
or :s
type would do. Other than that sounds good!
I would like to contribute some documentation that clarifies the expected image format to
Bumblebee.Vision.image_classification
. The typet:Bumblebee.Vision.image
says:However it does not clarify:
{:u, 8}
or some other type (some models suggest data should be in the range[0.0..1.0]
If I can get some guidance I'll write a doc PR.