Closed andynewman10 closed 9 months ago
I just pushed the .tflite model to use for the repro (that way, it is not necessary to run the python script to create the model):
https://github.com/andynewman10/testrepo/blob/main/testnet.tflite
I did manage to read the Android code performing the inference, but I had to decompile the AAR - somehow it's very difficult to find the code on github sometimes.
Anyway, as @fbernaly mentionned (thank you!), the ML Kit code dealing with InputImage
instances can be found here (for Android):
Right away I see that the only supported format for the Flutter package is nv21
. The native Android version supports yv12
and yuv_420_888
, too. That's important to know!
The code that I pasted above (colorconvertRGB_IYUV_I420
) uses yuv420
and has no hope to work.
Looking at the code I also discovered that the minimal image size is 32x32 (which is the size I am using, phew...)
So I went ahead with an RGB to NV21 converter, using the following code:
static Uint8List encodeYUV420SP(imglib.Image image) {
Uint8List argb = image.data!.toUint8List();
int width = image.width;
int height = image.height;
int ySize = width * height;
int uvSize = width * height * 2;
var yuv420sp = List<int>.filled((width * height * 3) ~/ 2, 0);
final int frameSize = width * height;
int yIndex = 0;
int uvIndex = frameSize;
int a, R, G, B, Y, U, V;
int index = 0;
for (int j = 0; j < height; j++) {
for (int i = 0; i < width; i++) {
a = (argb[index] & 0xff000000) >> 24; // a is not used obviously
R = (argb[index] & 0xff0000) >> 16;
G = (argb[index] & 0xff00) >> 8;
B = (argb[index] & 0xff) >> 0;
// well known RGB to YUV algorithm
Y = ((66 * R + 129 * G + 25 * B + 128) >> 8) + 16;
U = ((-38 * R - 74 * G + 112 * B + 128) >> 8) + 128;
V = ((112 * R - 94 * G - 18 * B + 128) >> 8) + 128;
/* NV21 has a plane of Y and interleaved planes of VU each sampled by a factor of 2
meaning for every 4 Y pixels there are 1 V and 1 U.
Note the sampling is every otherpixel AND every other scanline.*/
yuv420sp[yIndex++] = ((Y < 0) ? 0 : ((Y > 255) ? 255 : Y));
if (j % 2 == 0 && index % 2 == 0) {
yuv420sp[uvIndex++] = ((V < 0) ? 0 : ((V > 255) ? 255 : V));
yuv420sp[uvIndex++] = ((U < 0) ? 0 : ((U > 255) ? 255 : U));
}
index++;
}
}
return Uint8List.fromList(yuv420sp);
}
InputImage imgLibImageToInputImage(imglib.Image image) {
final bytes = encodeYUV420SP(image);
final metadata = InputImageMetadata(
format: InputImageFormat.nv21,
size: Size(image.width.toDouble(), image.height.toDouble()),
rotation: InputImageRotation.rotation0deg,
bytesPerRow: 0); // ignored
return InputImage.fromBytes(bytes: bytes, metadata: metadata);
}
This is some code I found on the web, and it looks pretty good to me, respecting the NV21
encoding where Y appears first then V and U are stored in interleaved form.
And it still doesn't work: InputImage.fromFilePath
does work, and the same image, generated with my code, doesn't.
Following my previous message, things are now working as expected, so I am closing this issue.
Things indeed work when nv21
is used to generate the InputImage. I mistakenly believed I had to use yuv420
because a limitation in Flutter, but that limitation was on the camera
package side (camera
could not handle nv21
until recently), not on the ML Kit side.
Great, actually that is specify in the README, you need to use nv21 when using the camera plugin.
I made an interesting experiment in which
InputImage
usingInputImage.fromBytes()
or get one throughInputImage.fromFile()
being exactly identical to the generated imageImageLabeler.processImage()
using a pass-through TF Lite model (see below for the model)List<ImageLabel>
values (output of the TF Lite model).This test is interesting in that it allows developers to verify that a generated
InputImage
instance is valid. In other words, it allows to study/debug Image-to-InputImage conversion routines easily.My question is: how to successfully create an InputImage from an imagelib Image? I have been trying all bits of code found on the web for weeks, to no avail.
This test is Android only for now.
Steps to reproduce the behavior:
Add model metadata using "passthrough" parameters : 0-mean, 1-std, numclasses=32x32x3 flattened=3072. To add metadata, I use metadata_writer_for_image_classifier.py, provided by the Tensorflow team.
I want all logits to be passed through, I can therefore set maxCount to 3072 (=32x32x3, flattened) or any higher value (1000000). Similarly,
confidenceThreshold: 0
is meant to include all values.then
handler in the code above and inspect the values ofimageLabels
.Expected behavior
Whether
readFromDisk
is true or false, I should get the same results. More specifically, I should get255.0
, 2048 labels (mapped to the green and blue values) with confidence =0.0
Actual behavior
When
readFromDisk
is true I get the expected results.When readFromDisk is false, I get:
[242.0, 138.0, 0.0]
. That's wrong. This means that theInputImage
I created is not what it should be.Additional testing
I rewrote the convertImage function so that an
InputImage
withyuv420
encoding is used: the results are also wrong.Logits (label confidence values) are in this case:
[239.0, 198.0, 61.0, 15.0, 9.0, 0.0]
. Again, they should be[255.0]
only (as in the InputImage.fromFilePath case, which shows that reading an image from disk works fine).Platform (please complete the following information):