The multi-class selfie segmenter does not run in GPU delegate mode on the Android Chrome browser.

salopge commented 8 months ago

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

None

OS Platform and Distribution

Android 14

Mobile device if the issue happens on mobile device

Samsung zflip-3

Browser and version if the issue happens on browser

Google Chrome 122.0.6261.90

Programming Language and version

Javascript

MediaPipe version

0.10.10

Bazel version

No response

Solution

multi-class selfie segmentation

Android Studio, NDK, SDK versions (if issue is related to building in Android environment)

No response

Xcode & Tulsi version (if issue is related to building for iOS)

No response

Describe the actual behavior

The multi-class segmentation results are not being output correctly even on mediapipe studio examples

Describe the expected behaviour

The multi-class segmentation results should be output correctly.

Standalone code/steps you may have used to try to get what you need

I am trying to use a multi-class segmentation model for separating clothes, face, and background. It runs without any issues on macOS and iPhone mobile browsers (Chrome, Safari) with GPU delegate settings, but on Android browsers, the results are not output correctly.

Even when running examples from MediaPipe Studio, the output is abnormal. I will attach the results from MediaPipe Studio.

Other info / Complete Logs

In the console window of the Android Chrome browser, the following message is displayed. In particular, the second message (GL_INVALID_FRAMEBUFFER_OPERATION: Framebuffer is incomplete: Attachment has zero size) seems to be output for every video frame.

* GL_INVALID_VALUE: Desired resource size is greater than max texture size
* GL_INVALID_FRAMEBUFFER_OPERATION: Framebuffer is incomplete: Attachment has zero size.

salopge commented 8 months ago

This is the screenshot from mediapipe studio with multi-class segmentation and gpu delegate option on a android 14 phone

lambiengcode commented 8 months ago

+1 same

kuaashish commented 6 months ago

Hi @salopge,

Could you please confirm whether the issue has been resolved on your end, or if you still require assistance from our end?

Thank you!!

salopge commented 6 months ago

Hi @kuaashish

Unfortunately, the issue has not been resolved on our end yet. We are still awaiting further assistance from you or your team.

Thank you for your attention to this matter!

kuaashish commented 6 months ago

Hi @salopge,

Thank you for confirming. I will bring this to the attention of our team. Please allow us some time, and we will update you through the same thread once we have any further information available.

schmidt-sebastian commented 6 months ago

@tyrmullen Do you have a suggestion for this?

tyrmullen commented 5 months ago

I suspect this is not particular to Android, but rather is device-specific (GPU-specific). Currently, the behavior on web is:

If CPU delegate is chosen, everything is on CPU.
If GPU delegate is chosen, we try to do everything on the GPU, for maximal performance.

From the errors, I believe what's happening is either:

the segmentation Tensor has a dimension which is too large to fit into GPU memory
the output size has a dimension which is too large to fit into GPU memory

@salopge To help confirm, do you mind browsing on the device with the issue to this webpage and screenshotting the resulting statistics? https://webglreport.com/?v=2. Also, if you could let us know the resolution of the images you're sending in, that would be helpful as well.

If this is indeed the case, then we can try linearizing the postprocessing textures to compress memory into "more square" textures, or we can simply fall back to CPU postprocessing for devices which would trigger this condition. But we should confirm first before proceeding, since all this is just a guess on my part for the moment.

salopge commented 5 months ago

@tyrmullen Thank you for looking into this issue!

Here is the information for my test phone:

Model Name: SM-F721N (Samsung Galaxy Z Flip4)
Android Version: 14 (Build/UP1A.231005.007)
Browser: Chrome, 124.0.6367.54

I have attached the screenshot of the webgl report page frome my device below: SM-F721N_WebglReport

I was testing the front camera on my phone, and the resolution appears to be 328x437, as indicated by the

If you need any additional data, please let me know. I will check and provide it Thanks!

tyrmullen commented 5 months ago

Thanks for the details-- I believe this let me track down the issue. It appears that GPU postprocessing is actually fine here for that model:

The model uses a 256x256 input tensor size, and with 6 classes to support in 4 channels, this ends in a 512x256 output texture, which is well within the limits of your device (4096x4096, as reported above).
Similarly, output size matches the video/camera resolution, and 328x437 is well within the limits too.
All intermediate textures have smaller or equal dimensions, and so are fine as well.

Unfortunately, while GPU postprocessing is fine, I now suspect the actual GPU ML inference engine appears to be unable to handle that model for your device. Specifically, it uses texStorage3D calls, with the largest dimension needed for the given model being 8192 when running on a more powerful device (a MacBookPro 2019). This is double your device's texture size limit, so if the inference engine can't adapt properly, that would cause everything to break (along with the error that you see).

TL;DR: I now suspect this particular 6-class model cannot be run with our GPU WebGL inference engine on that device. So for now, you might need to use CPU delegate or a different model, unfortunately. I do not have a device that can repro this myself to fully verify, but we will bubble this up to the team responsible for that inference engine and let you know if they have any additional thoughts.

[Aside: I don't think this will help your particular issue, but in general, one other piece of advice I'd offer for segmentation-related visual issues would be to try using a pure GPU pipeline rather than one that pulls things back to CPU at the end, and see if that changes anything. For example, instead of MediaPipe Studio, I'd also try using this tutorial app: https://github.com/google-ai-edge/mediapipe-samples/tree/main/tutorials/background_segmenter].

arrufat commented 5 months ago

@tyrmullen out of curiosity, on which device does this model work? We have tried several Android devices (all Samsung phones, though, the most recent being from 2021) and it works on none of them. But it works on all the iPhones we tried.

tyrmullen commented 5 months ago

I don't know for sure, but my best guess at the moment would be any devices which have a GPU supporting max texture size of >= 8192.

If you browse to that report site (https://webglreport.com/?v=2) for the devices you're testing on, is it indeed the case that on all succeeding devices the "Max Texture Size" reported is >= 8192, and on all failing devices the reported value is < 8192?

arrufat commented 5 months ago

@tyrmullen So, I've checked on Google Chrome, and all devices report 4096. I then tried Firefox and, on the same, I got 16384. See image below:

Screenshot_20240523_111245_Firefox

Then, I can see that the segmentation is working in the bar graphs below the image. However, it's not overlaid on the image (that might be a Firefox incompatibility with the drawing code, though). Screenshot_20240523_111833_Firefox

My point is that, on the same device, the model reports different "Max Texture Size" values depending on the browser, with Firefox showing a value 4 times larger than Chrome's. So it seems like the device is capable of running the model, but not with Google Chrome. Sadly, asking our users to switch to Firefox is not an option, given its low market share.

Let me know if I can provide some more information, I'll be glad to help.

tyrmullen commented 5 months ago

This does seem to reinforce my suspicion that this is a difference in WebGL texture limits, but I'm a bit surprised there's a browser component there! ... That makes me wonder if a difference in the backend is somehow affecting this (like ANGLE vs. non-ANGLE?). Do you mind pasting the Chrome report from that same device as well (and noting the device name and model, for reference)? I'm curious what differences there will be in a direct head-to-head comparison.

arrufat commented 5 months ago

Thank for the quick reply, here's the Chrome report on the same device as Firefox. Screenshot_20240523_121851_Chrome

tyrmullen commented 5 months ago

Hmmm... did a little digging and it sounds like Android Chrome may have decided to universally cap the max texture size to 4096 on all devices. According to internet comments, they did this because the max texture size was often misreported on Android devices, causing developers to accidentally request overlarge textures and resulting in rendering errors and crashes, so they capped it to a more "safe" limit.

That makes me wonder if the Firefox result is trustworthy, and if not, what the actual value for Adreno 730 really is...

In any case, I'll need to verify the above hard limit in the Chromium source to be sure, but if they really do cap the max texture size in WebGL2 to 4096, then I think we'd need either:

a Chrome browser fix to allow for larger textures in WebGL2
an inference engine upgrade so it can adapt to the smaller texture size limit

I have a certain amount of hope that the latter might be possible because, from what I'm observing when the width is >4096, the height of these too-large textures is usually quite small (< 100), so it seems like there could potentially be a way forwards through splitting up these textures potentially. However, that's a part of the inference engine and not MediaPipe, so I'll have to pose those questions to that team and see what they say.

Otherwise, it sounds like our options for now really are just (sadly):

use a different model
use a different browser (and hope for the best if the size limit is reported improperly I guess?)
use a different inference engine (the slower CPU delegate)

arrufat commented 5 months ago

@tyrmullen Thank you again for your detailed response. We'll find a workaround meanwhile. Let's hope that the issue can be solved soon.

tyrmullen commented 5 months ago

The issue was tracked down to a small fix in the inference engine, and I was able to confirm successful running of this model on a Pixel 8 Pro running Android Chrome; the fix has been submitted to the codebase, and the next update of the WebAssembly .wasm blob should no longer have this problem!

So after the next image segmentation (vision) .wasm release, please double-check and let me know if the issue has not been resolved.

arrufat commented 5 months ago

Thank you so much, I will let you know as soon as the new wasm blob is released.

kuaashish commented 2 months ago

Hi @salopge, @arrufat,

Could you please check now? We have the updated version 0.10.15 available, and the WASM file has been revised. Kindly test it and update us on the issue.

Thank you!!

github-actions[bot] commented 1 month ago

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

github-actions[bot] commented 1 month ago

This issue was closed due to lack of activity after being marked stale for past 7 days.

google-ml-butler[bot] commented 1 month ago

Are you satisfied with the resolution of your issue? Yes No

arrufat commented 1 month ago

Sorry for the delay. The model is now working, but it's still too slow on Android compared to a similarly spec'ed iPhone. But that's out of the scope of this issue.

salopge commented 1 month ago

Yes, I have tested it on SM-F711 and SM-S901 models, and it seems that multi-class segmentation works. However, it is relatively slow compared to the operation of an iPhone (for example, iPhone 13 mini)

I'm curious if this is related to the hardware specs of the phones or if it's something that can be improved through software

google-ai-edge / mediapipe