google-ai-edge / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.
https://ai.google.dev/edge/mediapipe
Apache License 2.0
27.71k stars 5.18k forks source link

Landscape Segmentation Offset Misalignment for Nvidia GPUs #5428

Open Singulariteehee opened 6 months ago

Singulariteehee commented 6 months ago

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

Yes

OS Platform and Distribution

Windows 11

MediaPipe Tasks SDK version

0.10.14

Task name (e.g. Image classification, Gesture recognition etc.)

Image Segmenter

Programming Language and version (e.g. C++, Python, Java)

Javascript

Describe the actual behavior

The segmentation output is misaligned for NVidia GPUs

Describe the expected behaviour

The segmentation output for CPU should match GPU for all GPUs

Standalone code/steps you may have used to try to get what you need

I have created a minimal example in CodePen that clearly shows the problem when executed on an NVidia GPU. When executed on my Intel integrated graphics or my old AMD R9 Fury X, the output matches between CPU and GPU. https://codepen.io/Singulariteehee/pen/OJYNoBy The problem is more advanced than a simple offset, as some lines do not suffer from the offset. If you simply try to reverse the offset, you still end up with crummy results because every 8th line gets jagged. Also, it is difficult to know when to reverse the offset because the browser is intent on preventing us from determining the hardware because of privacy concerns.

My previous report simply got closed because I wasn't looking at it, but hopefully I have put enough effort into the report this time that the problem can be properly recognized.

Other info / Complete Logs

No response

Singulariteehee commented 6 months ago

A_Pen_by_Singulariteehee_-_Google_Chrome_2024-05-21--04-25-18 Here is the result of the CodePen when executed on my RTX 3090.

Singulariteehee commented 6 months ago

I added an unshifted example diff to the CodePen: image

This result shows that shifting the returned mask down by one pixel makes the problem much less intense, but it is still full of artifacts on basically every line, with some lines being much worse than others.

kuaashish commented 5 months ago

Hi @Singulariteehee,

This behavior might be occurring because Nvidia and AMD use different drivers—Nvidia uses CUDA while AMD uses OpenGL in the backend. As a result, this behavior is anticipated, and there is not much we can do to address this issue.

Thank you!!

Singulariteehee commented 5 months ago

I can somewhat understand that there could be slight differences between implementations, but it doesn't make sense to me that an inferencing output that is offset by an entire texel could be an acceptable driver difference. If inferencing output from an LLM layer were offset by one, it simply wouldn't work at all. In this case, because it is a visual output, being off by one row vaguely appears to be functioning code. I have not seen this offset in pre tasks vision implementations of the landscape model.

I have created an additional Codepen for the regular version of the model (256x256): https://codepen.io/Singulariteehee/pen/qBGRMpq

In this version, the output is a pixel perfect match between the CPU and GPU implementation, even on NVidia cards. image

kuaashish commented 4 months ago

Hi @tyrmullen,

Could you please add some pointer here? There is a pixel match issue between Intel and Nvidia graphics cards. Any additional information you can provide will be very helpful.

Thank you!!