google-ai-edge / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.
https://mediapipe.dev
Apache License 2.0
26.32k stars 5.06k forks source link

Performance issue with JS demos on Android/Linux chromium #2278

Closed AzureRain1 closed 2 years ago

AzureRain1 commented 3 years ago

Please make sure that this is a bug and also refer to the troubleshooting, FAQ documentation before raising any issues.

System information (Please provide as much relevant information as possible)

Describe the current behavior: The demo on CodePen is a lot slower on Chromium than on Firefox. On Firefox I can get 30-50 fps, but on chromium it is constantly 7-8 fps no matter the setting. Wasm and WebGL are enabled and I can see WebGL context is allocated from console.

All demo from https://google.github.io/mediapipe/getting_started/javascript.html#ready-to-use-javascript-solutions have the same result. The behavior can also be observed on Android.

Describe the expected behavior: The js solutions shall have the same performance in chrome/chromium as in firefox.

Standalone code to reproduce the issue: Provide a reproducible test case that is the bare minimum necessary to replicate the problem. If possible, please share a link to Colab/repo link /any notebook:

Other info / Complete Logs : Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached

KKgVaPJ:1 Refused to apply style from 'https://cdpn.io/mediapipe/fullpage/demo.css' because its MIME type ('text/html') is not a supported stylesheet MIME type, and strict MIME checking is enabled.
logo_white.png:1 Failed to load resource: the server responded with a status of 404 ()
face_mesh_solution_simd_wasm_bin.js:9 
I0710 19:57:28.020000       1 gl_context_webgl.cc:151] Successfully created a WebGL context with major version 3 and handle 3
put_char @ face_mesh_solution_simd_wasm_bin.js:9
I0710 19:57:28.066000       1 gl_context.cc:348] GL version: 3.0 (OpenGL ES 3.0 (WebGL 2.0 (OpenGL ES 3.0 Chromium)))
put_char @ face_mesh_solution_simd_wasm_bin.js:9
W0710 19:57:28.067000       1 gl_context.cc:807] OpenGL error checking is disabled
put_char @ face_mesh_solution_simd_wasm_bin.js:9
KKgVaPJ:1 Refused to apply style from 'https://cdpn.io/mediapipe/fullpage/demo.css' because its MIME type ('text/html') is not a supported stylesheet MIME type, and strict MIME checking is enabled.
tyrmullen commented 2 years ago

This is not typical behavior. Especially on desktop Chrome, the demos should be running quite quickly, which makes me suspect you might have a Chrome setting/experiment/flag which is interfering. One thing to double-check would be to navigate to chrome://gpu and ensure that the settings there look good. And in particular, check that your WebGL2 is hardware accelerated, since software WebGL is extremely slow.

AzureRain1 commented 2 years ago

I checked and indeed WebGL2 is hardware accelerated. Actually just now I tried it on a newly installed Chrome on Windows, and I can reproduce the 8 fps behavior. It runs exactly 8 fps no matter the setting, like it is being capped somewhere.

I suspect if this is not a JS script problem, then there may be a bug with Chromium. I happened to know a little more about chromium source code, so let me know whether I can help diagnose the problem.

The browser version I tested was:

Google Chrome   92.0.4515.159 (Official Build) (64-bit) (cohort: Stable Installs & Full Version Pins)
Revision    0185b8a19c88c5dfd3e6c0da6686d799e9bc3b52-refs/branch-heads/4515@{#2052}
OS  Windows 10 OS Version 2009 (Build 19043.1165)
JavaScript  V8 9.2.230.29
User Agent  Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36
Command Line    "C:\Program Files\Google\Chrome\Application\chrome.exe" --flag-switches-begin --flag-switches-end

And here is the chrome://gpu report:

Graphics Feature Status
Canvas: Hardware accelerated
Compositing: Hardware accelerated
Multiple Raster Threads: Enabled
Out-of-process Rasterization: Hardware accelerated
OpenGL: Enabled
Rasterization: Hardware accelerated
Skia Renderer: Enabled
Video Decode: Hardware accelerated
Vulkan: Disabled
WebGL: Hardware accelerated
WebGL2: Hardware accelerated

Driver Bug Workarounds
check_ycbcr_studio_g22_left_p709_for_nv12_support
clear_uniforms_before_first_program_use
decode_encode_srgb_for_generatemipmap
disable_decode_swap_chain
disable_direct_composition_sw_video_overlays
enable_bgra8_overlays_with_yuv_overlay_support
exit_on_context_lost
max_msaa_sample_count_4
msaa_is_slow
disabled_extension_GL_KHR_blend_equation_advanced
disabled_extension_GL_KHR_blend_equation_advanced_coherent
Problems Detected
Some drivers are unable to reset the D3D device in the GPU process sandbox
Applied Workarounds: exit_on_context_lost
Clear uniforms before first program use on all platforms: 124764, 349137
Applied Workarounds: clear_uniforms_before_first_program_use
On Intel GPUs MSAA performance is not acceptable for GPU rasterization: 527565
Applied Workarounds: msaa_is_slow
Disable KHR_blend_equation_advanced until cc shaders are updated: 661715
Applied Workarounds: disable(GL_KHR_blend_equation_advanced), disable(GL_KHR_blend_equation_advanced_coherent)
Decode and Encode before generateMipmap for srgb format textures on Windows: 634519
Applied Workarounds: decode_encode_srgb_for_generatemipmap
Disable DecodeSwapChain for Intel Gen9 and older devices: 1107403
Applied Workarounds: disable_decode_swap_chain
Intel GPUs fail to report BGRA8 overlay support: 1119491
Applied Workarounds: enable_bgra8_overlays_with_yuv_overlay_support
8x MSAA for WebGL contexts is slow on Win Intel: 1145793
Applied Workarounds: max_msaa_sample_count_4
Disable software overlays for Intel GPUs. All Skylake+ devices support hw overlays, older devices peform poorly.: 1192748
Applied Workarounds: disable_direct_composition_sw_video_overlays
Check YCbCr_Studio_G22_Left_P709 color space for NV12 overlay support on Intel: 1103852
Applied Workarounds: check_ycbcr_studio_g22_left_p709_for_nv12_support
ANGLE Features
allow_compressed_formats (Frontend workarounds): Enabled: true
Allow compressed formats
disable_anisotropic_filtering (Frontend workarounds): Disabled
Disable support for anisotropic filtering
disable_program_binary (Frontend features) anglebug:5007: Disabled
Disable support for GL_OES_get_program_binary
disable_program_caching_for_transform_feedback (Frontend workarounds): Disabled
On some GPUs, program binaries don't contain transform feedback varyings
enableCompressingPipelineCacheInThreadPool (Frontend workarounds) anglebug:4722: Disabled: false
Enable compressing pipeline cache in thread pool.
enable_capture_limits (Frontend features) anglebug:5750: Disabled
Set the context limits like frame capturing was enabled
lose_context_on_out_of_memory (Frontend workarounds): Enabled: true
Some users rely on a lost context notification if a GL_OUT_OF_MEMORY error occurs
scalarize_vec_and_mat_constructor_args (Frontend workarounds) 1165751: Disabled: false
Always rewrite vec/mat constructors to be consistent
sync_framebuffer_bindings_on_tex_image (Frontend workarounds): Disabled
On some drivers TexImage sometimes seems to interact with the Framebuffer
add_mock_texture_no_render_target (D3D workarounds) anglebug:2152: Disabled: isIntel && capsVersion < IntelDriverVersion(4815)
On some drivers when rendering with no render target, two bugs lead to incorrect behavior
allow_clear_for_robust_resource_init (D3D workarounds) 941620: Enabled: true
Some drivers corrupt texture data when clearing for robust resource initialization.
allow_translate_uniform_block_to_structured_buffer (D3D workarounds) anglebug:3682: Enabled: IsWin10OrGreater()
There is a slow fxc compile performance issue with dynamic uniform indexing if translating a uniform block with a large array member to cbuffer.
call_clear_twice (D3D workarounds) 655534: Disabled: isIntel && isSkylake && capsVersion < IntelDriverVersion(4771)
Using clear() may not take effect
depth_stencil_blit_extra_copy (D3D workarounds) anglebug:1452: Disabled
Bug in some drivers triggers a TDR when using CopySubresourceRegion from a staging texture to a depth/stencil
disable_b5g6r5_support (D3D workarounds): Disabled: (isIntel && capsVersion < IntelDriverVersion(4539)) || isAMD
Textures with the format DXGI_FORMAT_B5G6R5_UNORM have incorrect data
emulate_isnan_float (D3D workarounds) 650547: Disabled: isIntel && isSkylake && capsVersion < IntelDriverVersion(4542)
Using isnan() on highp float will get wrong answer
emulate_tiny_stencil_textures (D3D workarounds): Disabled: isAMD && !(deviceCaps.featureLevel < D3D_FEATURE_LEVEL_10_1)
1x1 and 2x2 mips of depth/stencil textures aren't sampled correctly
expand_integer_pow_expressions (D3D workarounds): Enabled: true
The HLSL optimizer has a bug with optimizing 'pow' in certain integer-valued expressions
flush_after_ending_transform_feedback (D3D workarounds): Disabled: isNvidia
Some drivers sometimes write out-of-order results to StreamOut buffers when transform feedback is used to repeatedly write to the same buffer positions
force_atomic_value_resolution (D3D workarounds) anglebug:3246: Disabled: isNvidia
On some drivers the return value from RWByteAddressBuffer.InterlockedAdd does not resolve when used in the .yzw components of a RWByteAddressBuffer.Store operation
get_dimensions_ignores_base_level (D3D workarounds): Disabled: isNvidia
Some drivers do not take into account the base level of the texture in the results of the HLSL GetDimensions builtin
mrt_perf_workaround (D3D workarounds): Enabled: true
Some drivers have a bug where they ignore null render targets
pre_add_texel_fetch_offsets (D3D workarounds): Enabled: isIntel
HLSL's function texture.Load returns 0 when the parameter Location is negative, even if the sum of Offset and Location is in range
rewrite_unary_minus_operator (D3D workarounds): Disabled: isIntel && (isBroadwell || isHaswell) && capsVersion < IntelDriverVersion(4624)
Evaluating unary minus operator on integer may get wrong answer in vertex shaders
select_view_in_geometry_shader (D3D workarounds): Disabled: !deviceCaps.supportsVpRtIndexWriteFromVertexShader
The viewport or render target slice will be selected in the geometry shader stage for the ANGLE_multiview extension
set_data_faster_than_image_upload (D3D workarounds): Enabled: !(isIvyBridge || isBroadwell || isHaswell)
Set data faster than image upload
skip_vs_constant_register_zero (D3D workarounds): Disabled: isNvidia
In specific cases the driver doesn't handle constant register zero correctly
use_instanced_point_sprite_emulation (D3D workarounds): Disabled: isFeatureLevel9_3
Some D3D11 renderers do not support geometry shaders for pointsprite emulation
use_system_memory_for_constant_buffers (D3D workarounds) 593024: Enabled: isIntel
Copying from staging storage to constant buffer storage does not work
zero_max_lod (D3D workarounds): Disabled: isFeatureLevel9_3
Missing an option to disable mipmaps on a mipmapped texture

Version Information
Data exported   2021-08-17T02:59:17.204Z
Chrome version  Chrome/92.0.4515.159
Operating system    Windows NT 10.0.19043
Software rendering list URL https://chromium.googlesource.com/chromium/src/+/0185b8a19c88c5dfd3e6c0da6686d799e9bc3b52/gpu/config/software_rendering_list.json
Driver bug list URL https://chromium.googlesource.com/chromium/src/+/0185b8a19c88c5dfd3e6c0da6686d799e9bc3b52/gpu/config/gpu_driver_bug_list.json
ANGLE commit id f11eb737212f
2D graphics backend Skia/92 d9b8efde6df32e7480c985177118cdd4b72a5b0e
Command Line    "C:\Program Files\Google\Chrome\Application\chrome.exe" --flag-switches-begin --flag-switches-end
Driver Information
Initialization time 114
In-process GPU  false
Passthrough Command Decoder true
Sandboxed   true
GPU0    VENDOR= 0x8086, DEVICE=0x9bc4, SUBSYS=0x76b41458, REV=5, LUID={0,82756} *ACTIVE*
GPU1    VENDOR= 0x10de, DEVICE=0x1f14, SUBSYS=0x76b41458, REV=161, LUID={0,84008}
GPU2    VENDOR= 0x1414, DEVICE=0x008c, LUID={0,83959}
Optimus false
AMD switchable  false
Desktop compositing Aero Glass
Direct composition  true
Supports overlays   true
YUY2 overlay support    SCALING
NV12 overlay support    SCALING
BGRA8 overlay support   SCALING
RGB10A2 overlay support SOFTWARE
Diagonal Monitor Size of \\.\DISPLAY1   15.5"
Driver D3D12 feature level  D3D 12.1
Driver Vulkan API version   Vulkan API 1.2.0
Driver vendor   Intel
Driver version  27.20.100.8783
GPU CUDA compute capability major version   0
Pixel shader version    5.0
Vertex shader version   5.0
Max. MSAA samples   16
Machine model name  
Machine model version   
GL_VENDOR   Google Inc. (Intel)
GL_RENDERER ANGLE (Intel, Intel(R) UHD Graphics Direct3D11 vs_5_0 ps_5_0, D3D11-27.20.100.8783)
GL_VERSION  OpenGL ES 2.0.0 (ANGLE 2.1.15713 git hash: f11eb737212f)
GL_EXTENSIONS   GL_ANGLE_base_vertex_base_instance GL_ANGLE_client_arrays GL_ANGLE_depth_texture GL_ANGLE_explicit_context GL_ANGLE_explicit_context_gles1 GL_ANGLE_framebuffer_blit GL_ANGLE_framebuffer_multisample GL_ANGLE_get_serialized_context_string GL_ANGLE_get_tex_level_parameter GL_ANGLE_instanced_arrays GL_ANGLE_lossy_etc_decode GL_ANGLE_memory_size GL_ANGLE_multi_draw GL_ANGLE_multiview_multisample GL_ANGLE_pack_reverse_row_order GL_ANGLE_program_cache_control GL_ANGLE_provoking_vertex GL_ANGLE_request_extension GL_ANGLE_robust_client_memory GL_ANGLE_texture_compression_dxt3 GL_ANGLE_texture_compression_dxt5 GL_ANGLE_texture_usage GL_ANGLE_translated_shader_source GL_CHROMIUM_bind_generates_resource GL_CHROMIUM_bind_uniform_location GL_CHROMIUM_color_buffer_float_rgb GL_CHROMIUM_color_buffer_float_rgba GL_CHROMIUM_copy_compressed_texture GL_CHROMIUM_copy_texture GL_CHROMIUM_lose_context GL_CHROMIUM_sync_query GL_EXT_EGL_image_external_wrap_modes GL_EXT_blend_func_extended GL_EXT_blend_minmax GL_EXT_color_buffer_half_float GL_EXT_debug_label GL_EXT_debug_marker GL_EXT_discard_framebuffer GL_EXT_disjoint_timer_query GL_EXT_draw_buffers GL_EXT_draw_elements_base_vertex GL_EXT_float_blend GL_EXT_frag_depth GL_EXT_instanced_arrays GL_EXT_map_buffer_range GL_EXT_multisampled_render_to_texture GL_EXT_occlusion_query_boolean GL_EXT_read_format_bgra GL_EXT_robustness GL_EXT_sRGB GL_EXT_shader_texture_lod GL_EXT_texture_compression_bptc GL_EXT_texture_compression_dxt1 GL_EXT_texture_compression_rgtc GL_EXT_texture_compression_s3tc_srgb GL_EXT_texture_filter_anisotropic GL_EXT_texture_format_BGRA8888 GL_EXT_texture_rg GL_EXT_texture_storage GL_EXT_unpack_subimage GL_KHR_debug GL_KHR_parallel_shader_compile GL_NV_EGL_stream_consumer_external GL_NV_fence GL_NV_pack_subimage GL_NV_pixel_buffer_object GL_OES_EGL_image GL_OES_EGL_image_external GL_OES_compressed_EAC_R11_signed_texture GL_OES_compressed_EAC_R11_unsigned_texture GL_OES_compressed_EAC_RG11_signed_texture GL_OES_compressed_EAC_RG11_unsigned_texture GL_OES_compressed_ETC2_RGB8_texture GL_OES_compressed_ETC2_RGBA8_texture GL_OES_compressed_ETC2_punchthroughA_RGBA8_texture GL_OES_compressed_ETC2_punchthroughA_sRGB8_alpha_texture GL_OES_compressed_ETC2_sRGB8_alpha8_texture GL_OES_compressed_ETC2_sRGB8_texture GL_OES_depth24 GL_OES_depth32 GL_OES_draw_elements_base_vertex GL_OES_element_index_uint GL_OES_fbo_render_mipmap GL_OES_get_program_binary GL_OES_mapbuffer GL_OES_packed_depth_stencil GL_OES_rgb8_rgba8 GL_OES_standard_derivatives GL_OES_surfaceless_context GL_OES_texture_border_clamp GL_OES_texture_float GL_OES_texture_float_linear GL_OES_texture_half_float GL_OES_texture_half_float_linear GL_OES_texture_npot GL_OES_texture_stencil8 GL_OES_vertex_array_object GL_WEBGL_video_texture
Disabled Extensions GL_KHR_blend_equation_advanced GL_KHR_blend_equation_advanced_coherent
Disabled WebGL Extensions   
Window system binding vendor    Google Inc. (Intel)
Window system binding version   1.5 (ANGLE 2.1.15713 git hash: f11eb737212f)
Window system binding extensions    EGL_EXT_create_context_robustness EGL_ANGLE_d3d_share_handle_client_buffer EGL_ANGLE_d3d_texture_client_buffer EGL_ANGLE_surface_d3d_texture_2d_share_handle EGL_ANGLE_query_surface_pointer EGL_ANGLE_window_fixed_size EGL_ANGLE_keyed_mutex EGL_ANGLE_surface_orientation EGL_ANGLE_direct_composition EGL_NV_post_sub_buffer EGL_KHR_create_context EGL_KHR_image EGL_KHR_image_base EGL_KHR_gl_texture_2D_image EGL_KHR_gl_texture_cubemap_image EGL_KHR_gl_renderbuffer_image EGL_KHR_get_all_proc_addresses EGL_KHR_stream EGL_KHR_stream_consumer_gltexture EGL_NV_stream_consumer_gltexture_yuv EGL_ANGLE_flexible_surface_compatibility EGL_ANGLE_stream_producer_d3d_texture EGL_ANGLE_create_context_webgl_compatibility EGL_CHROMIUM_create_context_bind_generates_resource EGL_CHROMIUM_sync_control EGL_EXT_pixel_format_float EGL_KHR_surfaceless_context EGL_ANGLE_display_texture_share_group EGL_ANGLE_display_semaphore_share_group EGL_ANGLE_create_context_client_arrays EGL_ANGLE_program_cache_control EGL_ANGLE_robust_resource_initialization EGL_ANGLE_create_context_extensions_enabled EGL_ANDROID_blob_cache EGL_ANDROID_recordable EGL_ANGLE_image_d3d11_texture EGL_ANGLE_create_context_backwards_compatible EGL_KHR_create_context_no_error EGL_KHR_reusable_sync
Direct rendering version    unknown
Reset notification strategy 0x8252
GPU process crash count 0
gfx::BufferFormats supported for allocation and texturing   R_8: not supported, R_16: not supported, RG_88: not supported, BGR_565: not supported, RGBA_4444: not supported, RGBX_8888: not supported, RGBA_8888: not supported, BGRX_8888: not supported, BGRA_1010102: not supported, RGBA_1010102: not supported, BGRA_8888: not supported, RGBA_F16: not supported, YVU_420: not supported, YUV_420_BIPLANAR: not supported, P010: not supported
Compositor Information
Tile Update Mode    One-copy
Partial Raster  Enabled
GpuMemoryBuffers Status
R_8 Software only
R_16    Software only
RG_88   Software only
BGR_565 Software only
RGBA_4444   Software only
RGBX_8888   GPU_READ, SCANOUT
RGBA_8888   GPU_READ, SCANOUT
BGRX_8888   Software only
BGRA_1010102    Software only
RGBA_1010102    Software only
BGRA_8888   Software only
RGBA_F16    Software only
YVU_420 Software only
YUV_420_BIPLANAR    GPU_READ, SCANOUT, SCANOUT_CPU_READ_WRITE, GPU_READ_CPU_READ_WRITE
P010    Software only
Display(s) Information
Info    Display[2528732444] bounds=[0,0 1536x864], workarea=[0,0 1536x824], scale=1.25, rotation=0, panel_rotation=0 internal.
Color space (sRGB/no-alpha) {primaries_d50_referred: [[0.6484, 0.3281], [0.3216, 0.6048], [0.1609, 0.0657]], transfer:IEC61966_2_1, matrix:RGB, range:FULL}
Buffer format (sRGB/no-alpha)   BGRX_8888
Color space (sRGB/alpha)    {primaries_d50_referred: [[0.6484, 0.3281], [0.3216, 0.6048], [0.1609, 0.0657]], transfer:IEC61966_2_1, matrix:RGB, range:FULL}
Buffer format (sRGB/alpha)  BGRA_8888
Color space (WCG/no-alpha)  {primaries_d50_referred: [[0.6484, 0.3281], [0.3216, 0.6048], [0.1609, 0.0657]], transfer:IEC61966_2_1, matrix:RGB, range:FULL}
Buffer format (WCG/no-alpha)    BGRX_8888
Color space (WCG/alpha) {primaries_d50_referred: [[0.6484, 0.3281], [0.3216, 0.6048], [0.1609, 0.0657]], transfer:IEC61966_2_1, matrix:RGB, range:FULL}
Buffer format (WCG/alpha)   BGRA_8888
Color space (HDR/no-alpha)  {primaries_d50_referred: [[0.6484, 0.3281], [0.3216, 0.6048], [0.1609, 0.0657]], transfer:IEC61966_2_1, matrix:RGB, range:FULL}
Buffer format (HDR/no-alpha)    BGRX_8888
Color space (HDR/alpha) {primaries_d50_referred: [[0.6484, 0.3281], [0.3216, 0.6048], [0.1609, 0.0657]], transfer:IEC61966_2_1, matrix:RGB, range:FULL}
Buffer format (HDR/alpha)   BGRA_8888
SDR white level in nits 80
Bits per color component    8
Bits per pixel  24
Refresh Rate in Hz  240
Video Acceleration Information
Decode h264 baseline    64x64 to 4096x4096 pixels
Decode h264 main    64x64 to 4096x4096 pixels
Decode h264 high    64x64 to 4096x4096 pixels
Decode vp9 profile0 64x64 to 8192x8192 pixels
Decode vp9 profile2 64x64 to 8192x8192 pixels
Encode h264 baseline    0x0 to 1920x1088 pixels, and/or 30.000 fps
Encode h264 main    0x0 to 1920x1088 pixels, and/or 30.000 fps
Encode h264 high    0x0 to 1920x1088 pixels, and/or 30.000 fps
Vulkan Information
Device Performance Information
Total Physical Memory (Gb)  31
Total Disk Space (Gb)   100
Hardware Concurrency    16
System Commit Limit (Gb)    31
D3D11 Feature Level 12_1
Has Discrete GPU    yes
Intel GPU Generation    9
Software Rendering  No
Diagnostics
0
b3DAccelerationEnabled  true
b3DAccelerationExists   true
bAGPEnabled true
bAGPExistenceValid  true
bAGPExists  true
bCanRenderWindow    true
bDDAccelerationEnabled  true
bDriverBeta false
bDriverDebug    false
bDriverSigned   false
bDriverSignedValid  false
bNoHardware false
dwBpp   32
dwDDIVersion    12
dwHeight    1080
dwRefreshRate   240
dwWHQLLevel 0
dwWidth 1920
iAdapter    0
lDriverSize 1470024
lMiniVddSize    0
szAGPStatusEnglish  Enabled
szAGPStatusLocalized    Enabled
szChipType  Intel(R) UHD Graphics Family
szD3DStatusEnglish  Enabled
szD3DStatusLocalized    Enabled
szDACType   Internal
szDDIVersionEnglish 12
szDDIVersionLocalized   12
szDDStatusEnglish   Enabled
szDDStatusLocalized Enabled
szDXVAHDEnglish Supported
szDXVAModes ModeMPEG2_A ModeMPEG2_C ModeWMV9_C ModeVC1_C
szDescription   Intel(R) UHD Graphics
szDeviceId  0x9BC4
szDeviceIdentifier  {}
szDeviceName    \\.\DISPLAY1
szDisplayMemoryEnglish  16446 MB
szDisplayMemoryLocalized    16446 MB
szDisplayModeEnglish    1920 x 1080 (32 bit) (240Hz)
szDisplayModeLocalized  1920 x 1080 (32 bit) (240Hz)
szDriverAssemblyVersion 27.20.100.8783
szDriverAttributes  Final Retail
szDriverDateEnglish 9/23/2020 8:00:00 PM
szDriverDateLocalized   9/23/2020 20:00:00
szDriverLanguageEnglish English
szDriverLanguageLocalized   English
szDriverModelEnglish    WDDM 2.7
szDriverModelLocalized  WDDM 2.7
szDriverName    C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_\igdumdim64.dll,C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_\igd10iumd64.dll,C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_\igd10iumd64.dll,C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_\igd12umd64.dll
szDriverNodeStrongName  oem66.inf::iCML_w10_DS:27.20.100.8783:PCI\VEN_
szDriverSignDate    Unknown
szDriverVersion 27.20.0100.8783
szKeyDeviceID   Enum\PCI\VEN_
szKeyDeviceKey  \Registry\Machine\System\CurrentControlSet\Control\Video\{}\0000
szManufacturer  Intel Corporation
szMiniVdd   unknown
szMiniVddDateEnglish    Unknown
szMiniVddDateLocalized  unknown
szMonitorMaxRes Unknown
szMonitorName   Generic PnP Monitor
szNotesEnglish  No problems found.
szNotesLocalized    No problems found.
szOverlayEnglish    Supported
szRankOfInstalledDriver 00CF0001
szRegHelpText   Unknown
szRevision  Unknown
szRevisionId    0x0005
szSubSysId  0x76B41458
szTestResultD3D7English Not run
szTestResultD3D7Localized   Not run
szTestResultD3D8English Not run
szTestResultD3D8Localized   Not run
szTestResultD3D9English Not run
szTestResultD3D9Localized   Not run
szTestResultDDEnglish   Not run
szTestResultDDLocalized Not run
szVdd   unknown
szVendorId  0x8086
1
b3DAccelerationEnabled  true
b3DAccelerationExists   true
bAGPEnabled true
bAGPExistenceValid  false
bAGPExists  false
bCanRenderWindow    false
bDDAccelerationEnabled  true
bDriverBeta false
bDriverDebug    false
bDriverSigned   false
bDriverSignedValid  false
bNoHardware false
dwBpp   0
dwDDIVersion    12
dwHeight    0
dwRefreshRate   0
dwWHQLLevel 0
dwWidth 0
iAdapter    0
lDriverSize 1056272
lMiniVddSize    0
szAGPStatusEnglish  Enabled
szAGPStatusLocalized    Enabled
szChipType  NVIDIA GeForce RTX 2070 with Max-Q Design
szD3DStatusEnglish  Enabled
szD3DStatusLocalized    Enabled
szDACType   Integrated RAMDAC
szDDIVersionEnglish 12
szDDIVersionLocalized   12
szDDStatusEnglish   Enabled
szDDStatusLocalized Enabled
szDXVAHDEnglish Unknown
szDXVAModes Unknown
szDescription   NVIDIA GeForce RTX 2070 with Max-Q Design
szDeviceId  0x1F14
szDeviceIdentifier  Unknown
szDeviceName    Unknown
szDisplayMemoryEnglish  24349 MB
szDisplayMemoryLocalized    24349 MB
szDisplayModeEnglish    Unknown
szDisplayModeLocalized  unknown
szDriverAssemblyVersion 30.0.14.7141
szDriverAttributes  Final Retail
szDriverDateEnglish 7/11/2021 8:00:00 PM
szDriverDateLocalized   7/11/2021 20:00:00
szDriverLanguageEnglish English
szDriverLanguageLocalized   English
szDriverModelEnglish    WDDM 2.7
szDriverModelLocalized  WDDM 2.7
szDriverName    C:\Windows\System32\DriverStore\FileRepository\nvgbi.inf_amd64_\nvldumdx.dll,C:\Windows\System32\DriverStore\FileRepository\nvgbi.inf_amd64_\nvldumdx.dll,C:\Windows\System32\DriverStore\FileRepository\nvgbi.inf_amd64_\nvldumdx.dll,C:\Windows\System32\DriverStore\FileRepository\nvgbi.inf_amd64_\nvldumdx.dll
szDriverNodeStrongName  oem95.inf::Section031:30.0.14.7141:pci\ven_10de&dev_1f14&subsys_76b41458
szDriverSignDate    Unknown
szDriverVersion 30.00.0014.7141
szKeyDeviceID   Enum\PCI\VEN_
szKeyDeviceKey  Unknown
szManufacturer  NVIDIA
szMiniVdd   unknown
szMiniVddDateEnglish    Unknown
szMiniVddDateLocalized  unknown
szMonitorMaxRes Unknown
szMonitorName   Unknown
szNotesEnglish  No problems found.
szNotesLocalized    No problems found.
szOverlayEnglish    Unknown
szRankOfInstalledDriver 00CF0001
szRegHelpText   Unknown
szRevision  Unknown
szRevisionId    0x00A1
szSubSysId  0x76B41458
szTestResultD3D7English Not run
szTestResultD3D7Localized   Not run
szTestResultD3D8English Not run
szTestResultD3D8Localized   Not run
szTestResultD3D9English Not run
szTestResultD3D9Localized   Not run
szTestResultDDEnglish   Not run
szTestResultDDLocalized Not run
szVdd   unknown
szVendorId  0x10DE
Log Messages
GpuProcessHost: The info collection GPU process exited normally. Everything is okay.
[10052:8460:0816/225916.306:WARNING:ipc_message_attachment_set.cc(49)] : MessageAttachmentSet destroyed with unconsumed attachments: 0/1
[10052:8460:0816/225916.313:WARNING:ipc_message_attachment_set.cc(49)] : MessageAttachmentSet destroyed with unconsumed attachments: 0/1
tyrmullen commented 2 years ago

Interesting-- what type of computer is this? I don't see anything off-hand in the chrome://gpu which looks suspicious. And I'm assuming chrome://flags are set to default values? (Some of the experimental flags could definitely contribute to this, otherwise).

A few things to try: (1) Chrome does share a single GPU process across all of its tabs/windows, and does automatically cap framerate according to its measured GPU usage, so background processes could potentially be impacting throughput here. Have we checked to see if this issue repros if we close all Chrome windows/tabs (and disable extensions) [we can check running processes to ensure we fully killed Chrome], and then open just this demo?

(2) If you take a performance trace of one of the programs while it runs (and do mention which demo you test for this), where does it appear that the slowdown is occurring? (maybe grab a few screenshots of this?). You can do this by opening up the Chrome Developer Console while the demo is running, going to the "Performance" tab, and then recording for 5-6 seconds (the first few seconds will not always be accurate).

(3) To see if this is the result of a Chrome version or experiment, two amazing tools are:

AzureRain1 commented 2 years ago

what type of computer is this?

It is a Gigabyte laptop, Intel i7-10875H and nVidia RTX 2070 Max-Q, so hardware shall be more than enough to run. I also tried on a second laptop with MX250, and on that machine the fps ends up being 15, slightly better but still very under preformed.

chrome://flags are set to default values?

Yes.

Have we checked to see if this issue repros if we close all Chrome windows/tabs (and disable extensions) [we can check running processes to ensure we fully killed Chrome], and then open just this demo?

Yes, every time I kill the process and open only a single tab directly to the CodePen page, and I can reproduce every time.

If you take a performance trace of one of the programs while it runs (and do mention which demo you test for this), where does it appear that the slowdown is occurring?

Here are the screenshots for a ~7s tracing from the start of page loading, running the face detection demo. The first one is an overall looking and the second/third one is a close look to two frames:

Tracing_all Tracing_close_1 Tracing_close_2 Screenshot_20210818_122533

The observations I have is that there seems to be a long waiting/blocking time after each wasm function call, which I assume shall be the Run() call of the graph. The very short lines in screenshot 2 is magnified in screenshot 4 which seems to be frame rendering, but they seem to run for only several nano-seconds, and then the process just keeps waiting.

To see if this is the result of a Chrome version or experiment, two amazing tools are:

https://www.chromium.org/developers/bisect-builds-py
https://source.chromium.org/chromium/chromium/src/+/master:tools/variations/bisect_variations.py

I tried both, and turns out the problem is universal. For field-trails, I ran bisect_variations.py on Linux/chromium 92.0.4515.131 since the script somehow throw [Error 87] on Windows. Since there is actually only one parameter in Command-line variations (the Linux versions are built with fieldtrial_testing_enabled=false, which used to be called fieldtrial_testing_like_official_build=true), I think field trails are not the problem:

[user@localhost variations]$ python2 bisect_variations.py --input-file="/run/media/user/Data/test/variations_cmd.txt" --browser=chromium --url="https://codepen.io/mediapipe/full/dyOzvZM"
Run Chrome with variations file /run/media/user/Data/test/variations_cmd.txt
Can we reproduce with given variations file [(y)es/(n)o/(r)etry/(s)tdout/(q)uit]: y
Bisecting succeeded: --force-fieldtrials=UkmSamplingRate/Sampled_NoSeed_Other/ --enable-features=UkmSamplingRate<UkmSamplingRate --force-fieldtrial-params=UkmSamplingRate.Sampled_NoSeed_Other:_default_sampling/1

For versions, I ran the script on Windows and gave it a range from 782793 (85.0.4183.121) to 912534 (Canary, 95.0.4611.0). I also manually verified M85 and Canary. Turns out every version has this problem, except for M85 which failed to run the demo at all. The log is as follows:

F:\bisect_builds>python tools/bisect-builds.py -a win64 -g 912534 -b 782793 --use-local-cache -- --no-first-run
Scanning from 912534 to 782793 (129741 revisions).
Downloading list of known revisions...
Loaded revisions 389148-912679 from F:\bisect_builds\tools\.bisect-builds-cache.json
Downloading revision 839301...
Received 165231949 of 165231949 bytes, 100.00%
Bisecting range [782797 (bad), 912532 (good)], roughly 14 steps left.
Trying revision 839301...
Revision 839301 is [(g)ood/(b)ad/(r)etry/(u)nknown/(s)tdout/(q)uit]: b
Downloading revision 875714...
Received 176357479 of 176357479 bytes, 100.00%
Bisecting range [839301 (bad), 912532 (good)], roughly 13 steps left.
Trying revision 875714...
Revision 875714 is [(g)ood/(b)ad/(r)etry/(u)nknown/(s)tdout/(q)uit]: b
Downloading revision 894172...
Received 179809582 of 179809582 bytes, 100.00%
Bisecting range [875714 (bad), 912532 (good)], roughly 12 steps left.
Trying revision 894172...
Revision 894172 is [(g)ood/(b)ad/(r)etry/(u)nknown/(s)tdout/(q)uit]: b
Downloading revision 902856...
Received 180720286 of 180720286 bytes, 100.00%
Bisecting range [894172 (bad), 912532 (good)], roughly 11 steps left.
Trying revision 902856...
Revision 902856 is [(g)ood/(b)ad/(r)etry/(u)nknown/(s)tdout/(q)uit]: b
Downloading revision 907469...
Received 181162294 of 181162294 bytes, 100.00%
Bisecting range [902856 (bad), 912532 (good)], roughly 10 steps left.
Trying revision 907469...
Revision 907469 is [(g)ood/(b)ad/(r)etry/(u)nknown/(s)tdout/(q)uit]: b
Downloading revision 909739...
Received 182359155 of 182359155 bytes, 100.00%
Bisecting range [907469 (bad), 912532 (good)], roughly 9 steps left.
Trying revision 909739...
Revision 909739 is [(g)ood/(b)ad/(r)etry/(u)nknown/(s)tdout/(q)uit]: b
Downloading revision 911544...
Received 181196039 of 181196039 bytes, 100.00%
Bisecting range [909739 (bad), 912532 (good)], roughly 8 steps left.
Trying revision 911544...
Revision 911544 is [(g)ood/(b)ad/(r)etry/(u)nknown/(s)tdout/(q)uit]: b
Downloading revision 912005...
Received 181308716 of 181308716 bytes, 100.00%
Bisecting range [911544 (bad), 912532 (good)], roughly 7 steps left.
Trying revision 912005...
Revision 912005 is [(g)ood/(b)ad/(r)etry/(u)nknown/(s)tdout/(q)uit]: b
Downloading revision 912045...
Received 181305782 of 181305782 bytes, 100.00%
Bisecting range [912005 (bad), 912532 (good)], roughly 6 steps left.
Trying revision 912045...
Revision 912045 is [(g)ood/(b)ad/(r)etry/(u)nknown/(s)tdout/(q)uit]: b
Downloading revision 912261...
Received 181358561 of 181358561 bytes, 100.00%
Bisecting range [912045 (bad), 912532 (good)], roughly 5 steps left.
Trying revision 912261...
Revision 912261 is [(g)ood/(b)ad/(r)etry/(u)nknown/(s)tdout/(q)uit]: b
Downloading revision 912465...
Received 181409940 of 181409940 bytes, 100.00%
Bisecting range [912261 (bad), 912532 (good)], roughly 4 steps left.
Trying revision 912465...
Revision 912465 is [(g)ood/(b)ad/(r)etry/(u)nknown/(s)tdout/(q)uit]: b
Downloading revision 912508...
Received 181415660 of 181415660 bytes, 100.00%
Bisecting range [912465 (bad), 912532 (good)], roughly 3 steps left.
Trying revision 912508...
Revision 912508 is [(g)ood/(b)ad/(r)etry/(u)nknown/(s)tdout/(q)uit]: b
Downloading revision 912517...
Received 181415166 of 181415166 bytes, 100.00%
Bisecting range [912508 (bad), 912532 (good)], roughly 2 steps left.
Trying revision 912517...
Revision 912517 is [(g)ood/(b)ad/(r)etry/(u)nknown/(s)tdout/(q)uit]: b
Downloading revision 912523...
Received 181414162 of 181414162 bytes, 100.00%
Bisecting range [912517 (bad), 912532 (good)], roughly 2 steps left.
Trying revision 912523...
Revision 912523 is [(g)ood/(b)ad/(r)etry/(u)nknown/(s)tdout/(q)uit]: b
You are probably looking for a change made after 912523 (known bad), but no later than 912532 (first known good).
CHANGELOG URL:
  https://chromium.googlesource.com/chromium/src/+log/9a59481955730df5897785bd7bf36d96d8d3dbb8..2743430c1810b962229f9b20d52c1dc30e71d4e6
tyrmullen commented 2 years ago

Thank you for the thorough report! That behavior is quite odd indeed-- especially the performance trace. It shows that the actual MediaPipe processing time is extremely low (~5ms/frame), so there are no unexpected CPU-side costs to speak of. Our entire run loop is actually finished by the end of that thin call stack, so we don't make any explicit calls past that point-- however, GPU processing usually continues past that point. Seeing so little CPU processing but such long wait times between when video frames are passed in would normally make me suspect an over-busy GPU process, but the trace makes it look like our GPU processing is also relatively low as well-- this makes me wonder if something strange is happening to the video playback, or else maybe something is bogging down the GPU in some strange way (perhaps a subframe or background process in the CodePen site?). To narrow it down even further, here are some ideas:

(1) While the demo is running, can we check the Chrome CPU and GPU utilization? This seems like a good quick check to try to determine if there's some large over-utilization occurring from somewhere.

(2) To try to confirm the issue really is GPU-specific, we can switch our pipeline to a less efficient CPU-based one by adding options.useCpuInference = true; right beneath const options = x as mpFaceDetection.Options; in the CodePen JS code. Then we can see if there still exists a large performance delta between Firefox and Chrome. This won't get rid of all our GPU processing, but it switches the bulk of our operations from GPU to CPU, leaving only light rendering.

(3) We can try moving the code out of the CodePen site and into a standalone demo. We can just copy all of the code/settings directly into a local folder and then use python3 -m http.server to stage locally. This can determine whether or not CodePen plays a part in the issue.

(4) We can grab a full Chrome trace (chrome://tracing), for Javascript+Rendering. These have a lot more details on the system as a whole, although I must profess I'm not an expert yet at reading through them.

AzureRain1 commented 2 years ago

While the demo is running, can we check the Chrome CPU and GPU utilization?

Both are very low, <10%.

we can switch our pipeline to a less efficient CPU-based one by adding options.useCpuInference = true;

After experimenting a little, it seems the problem happens with both CPU and GPU version, so probably GPU is not the only problem. For the CPU version, Firefox will hang so I cannot get an fps from it, but in chromium it gave me the same 8 fps as running on GPU.

We can try moving the code out of the CodePen site and into a standalone demo.

I tried it and that didn't help. I ran the page using python server as mentioned, and it gave me the same 8 fps.

I will try grabbing a full trace later, but here are some other discoveries:

  1. I tried disabling canvas rendering by commenting out the whole onResults(results) function, but that didn't change anything.
  2. I tried to record Memory usage alongside JS frames in Performance tab for a much longer period of time (90s), and I believe I am looking at a pattern of memory leak. Here are the screenshots:

Screenshot_20210819_040733

A closer look gives me this: Screenshot_20210819_040820

Notice the zigzag pattern along the slopes. I am not sure whether it is related to the slow down, but probably this can be a hint. Also notice the red bars at the top below the ms notations. They actually says something like 117.7ms ~ 8 fps Dropped Frame, and the green bar between them says 8.1ms ~ 123 fps Frame. I am not sure what exactly it means but it looks like GPU is limited when not running graphs, which I found strange (I suppose it is not related to power management because I do not think GPU can switch power states that often).

tyrmullen commented 2 years ago

I think that sort of memory graph is fairly normal, since a lot of freed memory will still be left lying around until JS needs it and decides to garbage collect (the large drops in the graph). It's only if those low points in the memory graph rise over time (so the memory usage increases from one "drop" to the next) that a leak is present. If you use the Developer console to force garbage collection right before recording a memory reading, that can help control for this behavior. Also, any leaks in the C++ WebAssembly memory have a rather unique behavior of their own-- you'll see sudden large "jumps" in memory between long breaks, owing to how the C++ memory heap is grown.

The fact that none of our previous tests affected the video playback rate is very telling. Usually issues like this are due to GPU overutilization, but I think perhaps something very different is happening for you (since even with CPU ML inference the issue persists, and there's no sign of GPU or CPU strain). So I think we can systematically remove higher-level pipeline components to hopefully narrow it down a little more-- basically taking the canvas rendering disabling experiment you tried and pushing that further. Specifically:

And since we seem to be honing in on the video playback itself (possibly even a Chromium issue, as you posited, rather than a MediaPipe one), just to eliminate a few other possibly related variables, are you using a webcam for testing? If so, what kind? If not, what are you using for the video stream? (uploaded video duration and resolution?). And one other side-question: are you using an external monitor, and if so, does disconnecting that have an effect?

AzureRain1 commented 2 years ago

I should clearify a little that I made a small mistake in my previous comment, that I didn't remove faceDetection.onResults(onResults); but rather commented out the canvasCtx drawings inside that function, since removing the onResults function will prevent fpsControl.tick() from being called.

Does the issue persist if we don't even send frames into MediaPipe FaceDetection?

I commented out await faceDetection.send({image: input});, and that caused onResults() to stop being called too, so I did two experiments:

  1. I first tried switching to using requestAnimationFrame to measure fps on canvas, and fps on canvas itself turns out to be 60.

I went further on this and tested running the same measuring function when FaceDetection is active, and it turns out the canvas itself is still at 60fps, only the onResults function is called at a rate of 10 fps (I am testing on my second laptop). The JS code snippet I modified is here: https://gist.github.com/AzureRain1/c4fea9fd2eb16b38b22229ea0b3d09cc

Screenshot_20210819_034457

The white number on the top left corner of canvas is calculated based on how often onResults() is called, and it is equal to the result from fpsControl.tick(). The bottom number is calculated from requestAnimationFrame. The fps counter on the left top side of the whole page is from the rendering tab of Developer Tools.

  1. I then went ahead and put fpsControl.tick(); right after async (input: controls.InputImage, size: controls.Rectangle) => { and again commented out await faceDetection.send({image: input});. In this setting the fps counter still gave me a reading of 10.

If so, then does the issue persist if we don't even construct FaceDetection?

For this I commented out everything related to faceDetection, as well as put fpsControl.tick(); right after async (input: controls.InputImage, size: controls.Rectangle) => { like the previous setting. The fps counter still gave me a reading of 10.

If so, then does the issue persist if we use an alternate method to play the video stream?

For this, I removed the controls part completely and instead using

if (navigator.mediaDevices.getUserMedia) {
  navigator.mediaDevices.getUserMedia({ video: true })
    .then(function (stream) {
      window.stream = stream;
      videoElement.srcObject = stream;
      videoElement.play();
    })
    .catch(function (err0r) {
      console.log("Something went wrong!");
    });
}

and draw on canvs using canvasCtx.drawImage(videoElement, 0, 0, 1280, 720);. The fps measured by requestAnimationFrame in this case is 60 fps.

So it seems the problem is related to the controls somehow? It seems to make sense if the video is fed with a low fps, and that can explain the long idle times between calls. However I am not sure why the low video frame rate can be accross platform/accross devices.

are you using a webcam for testing? If so, what kind? If not, what are you using for the video stream? (uploaded video duration and resolution?). And one other side-question: are you using an external monitor, and if so, does disconnecting that have an effect?

All tests are done using webcams on devices. None of the tests involved an external monitor. I made a table about the devices I have tested, so things are easier to understand:

Device Platform/OS Camera Chrome/Chromium Version FPS
Laptop Windows 10 Home Laptop Embedded Chrome (Official Build) 92.0.4515.131 8
Laptop Fedora 34 Laptop Embedded Chromium (Fedora RPM) 91.0.4472.164 10
Sony Android 10 (AOSP) Phone Front Camera Chromium (Developer Build) 92.0.4515.131 12~14
Samsung Galaxy Android 11 (OEM) Phone Back Camera Chrome (Official Build) 92.0.4515.131 ~17
tyrmullen commented 2 years ago

The Android numbers aren't too far from what I'd expect, but the Desktop numbers are definitely strangely low. However, from the experiments it sounds like the MediaPipe JS APIs are actually functioning perfectly there (and are very fast), but the video playback is where the issue is occurring. The main difference between the final snippet you posted and the code in controls is that we have an additional check so that we don't process the same video frame twice. Specifically, whenever we process a frame, we store video.currentTime, and if it matches the last-seen value, we do not process.

I suspect that if you add that check in as well, then you'll see the 10fps capping even with your final snippet, confirming that this is in fact a Chromium bug. Then the only question would be whether the issue is with video.currentTime not updating properly (so video playback is actually at 60fps even though frame updates are not being reported), or whether the video stream itself really is at 10fps-- the apparent smoothness of the video for you when it reports that it is playing at 60fps should be the determining factor there. (Many built-in webcams are just always capped at 30fps, so I'd expect you'd be visually comparing between 30fps and 10fps; it's the Chromium browser max refresh rate which is 60fps).

As one other side-test, the experimental "requestVideoFrameCallback" call was created so that users didn't have to try to manually control for video frame updates from the browser. If that call exists for you, you could try using that as a replacement for requestAnimationFrame and see if that helps?

AzureRain1 commented 2 years ago

I added the currentTime check to the requestAnimationFrame calls and indeed the fps is capped there too. However I happened to figure out that the low fps is caused by hardware, namely when the camera is in low-light conditions, then it will cap its output to a lower fps of 10, and only in bright conditions it will output a full 30 fps. I then went ahead and tested this theory on unmodified CodePen examples and indeed with bright light conditions they now run at full 30 fps, on both of my laptops.

So to summarize, it was the hardware outputting low fps video streams the whole time, starving the downstream processing. There appears to be nothing wrong with the JS API.

I think you can considered this solved now.

tyrmullen commented 2 years ago

Ahhh; I've run into some slight variations in FPS from changes in lighting conditions, but never anything near that drastic before. Good to know! Closing as resolved.

google-ml-butler[bot] commented 2 years ago

Are you satisfied with the resolution of your issue? Yes No