Closed magreenblatt closed 7 years ago
Original comment by Dmitry Azaraev (Bitbucket: dmitry-azaraev, GitHub: dmitry-azaraev).
Looks like CEF build skia with AVX2 support. Intel Xeon X5560 doesn't support AVX or AVX2 (but support any other SSEs). So this is looks like a root of problem. Need find a way how to tweak build options.
Original comment by Dmitry Azaraev (Bitbucket: dmitry-azaraev, GitHub: dmitry-azaraev).
Google Chrome Version 53.0.2785.116 m (64-bit) is work on target host, so it is looks like CEF-build specific. Official builds now built with 2015U3, and I'm tried to build with MSVS 2015 Update 3.1, and still got same result. May be building with 2015 Update 2 can help.
Original comment by Dmitry Azaraev (Bitbucket: dmitry-azaraev, GitHub: dmitry-azaraev).
There is compiler issue (LTCG): 2015U3 produce result depending on object file ordering. If first object file contains AVX instruction set, then following objects also generate AVX instruction set, even if they had been compiled with lower instruction set. I.e. cl /ltcg sse2.obj avx.obj
will produce correct result, but cl /ltcg avx.obj sse2.obj
now produces incorrect result (images looks like works fine, but requires AVX).
Original comment by Dmitry Azaraev (Bitbucket: dmitry-azaraev, GitHub: dmitry-azaraev).
VS2015U3.1 LTCG bug reproduction test case
Original comment by Dmitry Azaraev (Bitbucket: dmitry-azaraev, GitHub: dmitry-azaraev).
Because i'm did not encounter in any problems with build 2785 branch with 2015 Update 3 except this, it is have sense to track down this problem deeper. As i'm say before, this is tied to possible bug? in LTCG.
vs2015u3-ltcg-bug-1.zip includes build.cmd script which should build avx-sse.exe and sse-avx.exe. This executable build from same obj modules, difference only in order of object files which is passed to linker.
Also *.disasm files generated to easy understand difference without touching debugger.
Tested with Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24215.1 for x64.
SSE-AVX: Correct case:
?use_sse@@YAXXZ:
000000014000110C: 48 83 EC 38 sub rsp,38h
[0000000140001110 (bb)](https://bitbucket.org/chromiumembedded/cef/commits/0000000140001110): 0F 28 2D 19 DC 04 movaps xmm5,xmmword ptr [__xmm@4080000040400000400000003f800000]
00
[0000000140001117 (bb)](https://bitbucket.org/chromiumembedded/cef/commits/0000000140001117): 48 8D 0D DA DB 04 lea rcx,[??_C@_0BC@POEDNAAP@SSE?3?5?$CFf?5?$CFf?5?$CFf?5?$CFf?6?$AA@]
00
000000014000111E: 0F 28 C5 movaps xmm0,xmm5
[0000000140001121 (bb)](https://bitbucket.org/chromiumembedded/cef/commits/0000000140001121): 0F 57 E4 xorps xmm4,xmm4
[0000000140001124 (bb)](https://bitbucket.org/chromiumembedded/cef/commits/0000000140001124): 0F C6 C5 FF shufps xmm0,xmm5,0FFh
[0000000140001128 (bb)](https://bitbucket.org/chromiumembedded/cef/commits/0000000140001128): 0F 28 CD movaps xmm1,xmm5
000000014000112B: F3 0F 5A E0 cvtss2sd xmm4,xmm0
000000014000112F: 0F C6 CD AA shufps xmm1,xmm5,0AAh
[0000000140001133 (bb)](https://bitbucket.org/chromiumembedded/cef/commits/0000000140001133): 0F 57 DB xorps xmm3,xmm3
[0000000140001136 (bb)](https://bitbucket.org/chromiumembedded/cef/commits/0000000140001136): F3 0F 5A D9 cvtss2sd xmm3,xmm1
000000014000113A: 0F 28 C5 movaps xmm0,xmm5
000000014000113D: F2 0F 11 64 24 20 movsd mmword ptr [rsp+20h],xmm4
[0000000140001143 (bb)](https://bitbucket.org/chromiumembedded/cef/commits/0000000140001143): 0F C6 C5 55 shufps xmm0,xmm5,55h
[0000000140001147 (bb)](https://bitbucket.org/chromiumembedded/cef/commits/0000000140001147): 0F 57 D2 xorps xmm2,xmm2
000000014000114A: 0F 57 C9 xorps xmm1,xmm1
000000014000114D: 66 49 0F 7E D9 movd r9,xmm3
[0000000140001152 (bb)](https://bitbucket.org/chromiumembedded/cef/commits/0000000140001152): F3 0F 5A D0 cvtss2sd xmm2,xmm0
[0000000140001156 (bb)](https://bitbucket.org/chromiumembedded/cef/commits/0000000140001156): F3 0F 5A CD cvtss2sd xmm1,xmm5
000000014000115A: 66 49 0F 7E D0 movd r8,xmm2
000000014000115F: 66 48 0F 7E CA movd rdx,xmm1
[0000000140001164 (bb)](https://bitbucket.org/chromiumembedded/cef/commits/0000000140001164): E8 FB FE FF FF call printf
[0000000140001169 (bb)](https://bitbucket.org/chromiumembedded/cef/commits/0000000140001169): 48 83 C4 38 add rsp,38h
000000014000116D: C3 ret
AVX-SSE: incorrect case:
?use_sse@@YAXXZ:
000000014000117C: 48 83 EC 48 sub rsp,48h
[0000000140001180 (bb)](https://bitbucket.org/chromiumembedded/cef/commits/0000000140001180): F3 0F 10 05 A8 DB movss xmm0,dword ptr [__real@40800000]
04 00
[0000000140001188 (bb)](https://bitbucket.org/chromiumembedded/cef/commits/0000000140001188): 48 8D 4C 24 30 lea rcx,[rsp+30h]
000000014000118D: F3 0F 10 1D 97 DB movss xmm3,dword ptr [__real@40400000]
04 00
[0000000140001195 (bb)](https://bitbucket.org/chromiumembedded/cef/commits/0000000140001195): F3 0F 10 15 8B DB movss xmm2,dword ptr [__real@40000000]
04 00
000000014000119D: F3 0F 10 0D 7F DB movss xmm1,dword ptr [__real@3f800000]
04 00
00000001400011A5: F3 0F 11 44 24 20 movss dword ptr [rsp+20h],xmm0
00000001400011AB: E8 08 FF FF FF call ??0Sk4f@@QEAA@MMMM@Z
00000001400011B0: BA 03 00 00 00 mov edx,3
00000001400011B5: 48 8D 4C 24 30 lea rcx,[rsp+30h]
00000001400011BA: E8 19 FF FF FF call ??ASk4f@@QEBAMH@Z
00000001400011BF: 0F 57 E4 xorps xmm4,xmm4
00000001400011C2: 48 8D 4C 24 30 lea rcx,[rsp+30h]
00000001400011C7: BA 02 00 00 00 mov edx,2
00000001400011CC: F3 0F 5A E0 cvtss2sd xmm4,xmm0
00000001400011D0: E8 03 FF FF FF call ??ASk4f@@QEBAMH@Z
00000001400011D5: 48 8D 4C 24 30 lea rcx,[rsp+30h]
00000001400011DA: BA 01 00 00 00 mov edx,1
00000001400011DF: 0F 57 DB xorps xmm3,xmm3
00000001400011E2: F3 0F 5A D8 cvtss2sd xmm3,xmm0
00000001400011E6: E8 ED FE FF FF call ??ASk4f@@QEBAMH@Z
00000001400011EB: 48 8D 4C 24 30 lea rcx,[rsp+30h]
00000001400011F0: 33 D2 xor edx,edx
00000001400011F2: 0F 57 D2 xorps xmm2,xmm2
00000001400011F5: F3 0F 5A D0 cvtss2sd xmm2,xmm0
00000001400011F9: E8 DA FE FF FF call ??ASk4f@@QEBAMH@Z
00000001400011FE: 48 8D 0D 0B DB 04 lea rcx,[??_C@_0BC@POEDNAAP@SSE?3?5?$CFf?5?$CFf?5?$CFf?5?$CFf?6?$AA@]
00
[0000000140001205 (bb)](https://bitbucket.org/chromiumembedded/cef/commits/0000000140001205): 0F 57 C9 xorps xmm1,xmm1
[0000000140001208 (bb)](https://bitbucket.org/chromiumembedded/cef/commits/0000000140001208): F2 0F 11 64 24 20 movsd mmword ptr [rsp+20h],xmm4
000000014000120E: F3 0F 5A C8 cvtss2sd xmm1,xmm0
[0000000140001212 (bb)](https://bitbucket.org/chromiumembedded/cef/commits/0000000140001212): 66 49 0F 7E D9 movd r9,xmm3
[0000000140001217 (bb)](https://bitbucket.org/chromiumembedded/cef/commits/0000000140001217): 66 49 0F 7E D0 movd r8,xmm2
000000014000121C: 66 48 0F 7E CA movd rdx,xmm1
[0000000140001221 (bb)](https://bitbucket.org/chromiumembedded/cef/commits/0000000140001221): E8 3E FE FF FF call printf
[0000000140001226 (bb)](https://bitbucket.org/chromiumembedded/cef/commits/0000000140001226): 48 83 C4 48 add rsp,48h
000000014000122A: C3 ret
??0Sk4f@@QEAA@MMMM@Z:
00000001400010B8: C5 F8 28 C1 vmovaps xmm0,xmm1
00000001400010BC: C4 E3 79 21 C2 10 vinsertps xmm0,xmm0,xmm2,10h
00000001400010C2: C4 E3 79 21 C3 20 vinsertps xmm0,xmm0,xmm3,20h
00000001400010C8: C4 E3 79 21 44 24 vinsertps xmm0,xmm0,dword ptr [rsp+28h],30h
28 30
00000001400010D0: C5 F8 11 01 vmovups xmmword ptr [rcx],xmm0
00000001400010D4: 48 8B C1 mov rax,rcx
00000001400010D7: C3 ret
So, what's difference: in first case Sk4f constructor is completely inlined and it is holds only SSE instructions. In second case method body looks fine, but Sk4f constructor is not inlined. If we take a look on constructor code (listed above) - it is built with AVX instructions. So, now - our SSE-only code no more work on CPU's without AVX instruction set, and this completely depends on order of object files passed to linker.
Update: In CEF build i'm got crash exactly on Sk4f constructor, which looks very similar.
Original comment by Dmitry Azaraev (Bitbucket: dmitry-azaraev, GitHub: dmitry-azaraev).
This script resort object files in libcef.ninja file. I'm built libcef with 2015U3 using this order and this looks like work (cefclient) runs on non-AVX host.
This script actually makes next files are last:
obj/third_party/libjpeg_turbo/simd_asm/jfdctflt-sse-64.o
obj/media/base/media_yasm/convert_yuv_to_rgb_sse.o
obj/media/base/media_yasm/linear_scale_yuv_to_rgb_sse.o
obj/media/base/media_yasm/scale_yuv_to_rgb_sse.o
obj/skia/skia_opts/SkBitmapFilter_opts_SSE2.obj
obj/skia/skia_opts/SkBitmapProcState_opts_SSE2.obj
obj/skia/skia_opts/SkBlitRow_opts_SSE2.obj
obj/third_party/libpng/libpng_sources/filter_sse2_intrinsics.obj
obj/third_party/libjpeg_turbo/simd_asm/jccolor-sse2-64.o
obj/third_party/libjpeg_turbo/simd_asm/jcgray-sse2-64.o
obj/third_party/libjpeg_turbo/simd_asm/jchuff-sse2-64.o
obj/third_party/libjpeg_turbo/simd_asm/jcsample-sse2-64.o
obj/third_party/libjpeg_turbo/simd_asm/jdcolor-sse2-64.o
obj/third_party/libjpeg_turbo/simd_asm/jdmerge-sse2-64.o
obj/third_party/libjpeg_turbo/simd_asm/jdsample-sse2-64.o
obj/third_party/libjpeg_turbo/simd_asm/jfdctfst-sse2-64.o
obj/third_party/libjpeg_turbo/simd_asm/jfdctint-sse2-64.o
obj/third_party/libjpeg_turbo/simd_asm/jidctflt-sse2-64.o
obj/third_party/libjpeg_turbo/simd_asm/jidctfst-sse2-64.o
obj/third_party/libjpeg_turbo/simd_asm/jidctint-sse2-64.o
obj/third_party/libjpeg_turbo/simd_asm/jidctred-sse2-64.o
obj/third_party/libjpeg_turbo/simd_asm/jquantf-sse2-64.o
obj/third_party/libjpeg_turbo/simd_asm/jquanti-sse2-64.o
obj/media/base/base/convert_rgb_to_yuv_sse2.obj
obj/media/base/base/filter_yuv_sse2.obj
obj/media/base/media_yasm/scale_yuv_to_rgb_sse2_x64.o
obj/third_party/libvpx/libvpx_yasm/copy_sse2.o
obj/third_party/libvpx/libvpx_yasm/idctllm_sse2.o
obj/third_party/libvpx/libvpx_yasm/iwalsh_sse2.o
obj/third_party/libvpx/libvpx_yasm/loopfilter_block_sse2_x86_64.o
obj/third_party/libvpx/libvpx_yasm/loopfilter_sse2.o
obj/third_party/libvpx/libvpx_yasm/mfqe_sse2.o
obj/third_party/libvpx/libvpx_yasm/postproc_sse2.o
obj/third_party/libvpx/libvpx_yasm/recon_sse2.o
obj/third_party/libvpx/libvpx_yasm/subpixel_sse2.o
obj/third_party/libvpx/libvpx_yasm/dct_sse2.o
obj/third_party/libvpx/libvpx_yasm/fwalsh_sse2.o
obj/third_party/libvpx/libvpx_yasm/vp9_mfqe_sse2.o
obj/third_party/libvpx/libvpx_yasm/vp9_postproc_sse2.o
obj/third_party/libvpx/libvpx_yasm/vp9_dct_sse2.o
obj/third_party/libvpx/libvpx_yasm/vp9_error_sse2.o
obj/third_party/libvpx/libvpx_yasm/vp9_temporal_filter_apply_sse2.o
obj/third_party/libvpx/libvpx_yasm/add_noise_sse2.o
obj/third_party/libvpx/libvpx_yasm/halfpix_variance_impl_sse2.o
obj/third_party/libvpx/libvpx_yasm/intrapred_sse2.o
obj/third_party/libvpx/libvpx_yasm/inv_wht_sse2.o
obj/third_party/libvpx/libvpx_yasm/sad4d_sse2.o
obj/third_party/libvpx/libvpx_yasm/sad_sse2.o
obj/third_party/libvpx/libvpx_yasm/subpel_variance_sse2.o
obj/third_party/libvpx/libvpx_yasm/subtract_sse2.o
obj/third_party/libvpx/libvpx_yasm/vpx_convolve_copy_sse2.o
obj/third_party/libvpx/libvpx_yasm/vpx_subpixel_8t_sse2.o
obj/third_party/libvpx/libvpx_yasm/vpx_subpixel_bilinear_sse2.o
obj/third_party/libwebp/libwebp_dsp_sse2/alpha_processing_sse2.obj
obj/third_party/libwebp/libwebp_dsp_sse2/argb_sse2.obj
obj/third_party/libwebp/libwebp_dsp_sse2/cost_sse2.obj
obj/third_party/libwebp/libwebp_dsp_sse2/dec_sse2.obj
obj/third_party/libwebp/libwebp_dsp_sse2/enc_sse2.obj
obj/third_party/libwebp/libwebp_dsp_sse2/filters_sse2.obj
obj/third_party/libwebp/libwebp_dsp_sse2/lossless_enc_sse2.obj
obj/third_party/libwebp/libwebp_dsp_sse2/lossless_sse2.obj
obj/third_party/libwebp/libwebp_dsp_sse2/rescaler_sse2.obj
obj/third_party/libwebp/libwebp_dsp_sse2/upsampling_sse2.obj
obj/third_party/libwebp/libwebp_dsp_sse2/yuv_sse2.obj
obj/third_party/qcms/qcms/transform-sse2.obj
obj/third_party/libvpx/libvpx_intrinsics_sse2.lib
obj/skia/skia_opts_sse3/SkBitmapProcState_opts_SSSE3.obj
obj/skia/skia_opts_sse3/SkOpts_ssse3.obj
obj/media/base/base/convert_rgb_to_yuv_ssse3.obj
obj/media/base/media_yasm/convert_rgb_to_yuv_ssse3.o
obj/third_party/libvpx/libvpx_yasm/copy_sse3.o
obj/third_party/libvpx/libvpx_yasm/subpixel_ssse3.o
obj/third_party/libvpx/libvpx_yasm/vp9_quantize_ssse3_x86_64.o
obj/third_party/libvpx/libvpx_yasm/avg_ssse3_x86_64.o
obj/third_party/libvpx/libvpx_yasm/fwd_txfm_ssse3_x86_64.o
obj/third_party/libvpx/libvpx_yasm/intrapred_ssse3.o
obj/third_party/libvpx/libvpx_yasm/inv_txfm_ssse3_x86_64.o
obj/third_party/libvpx/libvpx_yasm/quantize_ssse3_x86_64.o
obj/third_party/libvpx/libvpx_yasm/sad_sse3.o
obj/third_party/libvpx/libvpx_yasm/sad_ssse3.o
obj/third_party/libvpx/libvpx_yasm/vpx_subpixel_8t_ssse3.o
obj/third_party/libvpx/libvpx_yasm/vpx_subpixel_bilinear_ssse3.o
obj/third_party/libvpx/libvpx_intrinsics_ssse3.lib
obj/skia/skia_opts_sse41/SkOpts_sse41.obj
obj/skia/skia_opts_sse42/SkForceCPlusPlusLinking.obj
obj/third_party/libvpx/libvpx_yasm/sad_sse4.o
obj/third_party/libwebp/libwebp_dsp_sse41/alpha_processing_sse41.obj
obj/third_party/libwebp/libwebp_dsp_sse41/dec_sse41.obj
obj/third_party/libwebp/libwebp_dsp_sse41/enc_sse41.obj
obj/third_party/libwebp/libwebp_dsp_sse41/lossless_enc_sse41.obj
obj/third_party/libvpx/libvpx_intrinsics_sse4_1.lib
obj/skia/skia_opts_avx/SkOpts_avx.obj
obj/third_party/libvpx/libvpx_yasm/quantize_avx_x86_64.o
obj/third_party/libvpx/libvpx_intrinsics_avx.lib
obj/skia/skia_opts_avx2/SkForceCPlusPlusLinking.obj
obj/third_party/boringssl/boringssl_asm/rsaz-avx2.o
obj/third_party/libwebp/libwebp_dsp/enc_avx2.obj
obj/third_party/libvpx/libvpx_intrinsics_avx2.lib
To clarify, this bug is not triggered if the SSE object files are included before the AVX object files.
As Dmitry describes above, he created a build after ordering the list of files in obj/cef/libcef.ninja. The bug was not triggered when the obj files were ordered as: generic, sse, sse2, sse3, sse4, avx, avx2.
We think this bug is not triggered in Chrome either because chrome uses PGO, or because the chrome ninja files just happen to include sse first. Chrome versions that currently build with Update 3 are canary and master.
Original comment by Dmitry Azaraev (Bitbucket: dmitry-azaraev, GitHub: dmitry-azaraev).
I'm add issue on Microsoft Connect MSVC 19.00.24215.1 generates wrong SSE/AVX mixed code with LTCG
Issue #1998 was marked as a duplicate of this issue.
Related Chromium issue: https://bugs.chromium.org/p/chromium/issues/detail?id=654213
Workaround added in 2840 branch revision 175be9a (bb), 2883 branch revision 3a77b24 (bb) and master branch revision f7a4102 (bb).
Original comment by Dmitry Azaraev (Bitbucket: dmitry-azaraev, GitHub: dmitry-azaraev).
In chromium they only workaround same by apply some forced inlines in skia, but this workaround produces stable and still efficient result on current compiler without changes sources for whole codebase without inspecting it. Once C++ compliant (program-wide ODR-violation-free) implementations will be provided by chromium (third party libs mainly) it is safe to disable it.
Original changes by Dmitry Azaraev (Bitbucket: dmitry-azaraev, GitHub: dmitry-azaraev).
Original changes by Dmitry Azaraev (Bitbucket: dmitry-azaraev, GitHub: dmitry-azaraev).
Original changes by Dmitry Azaraev (Bitbucket: dmitry-azaraev, GitHub: dmitry-azaraev).
Original report by Dmitry Azaraev (Bitbucket: dmitry-azaraev, GitHub: dmitry-azaraev).
Environment to reproduce:
cefclient starts like normal, creates a renderer process and immediately after renderer process created it crashed. No any error/fatal entries in log appears. There is impossible to catch error in any way except crash dump.
I'm created a dump file via WER (whoa, it is work this time), and got next results:
Faulted code:
I'm not sure what happens: CPU is not support SSE2 command, or command really invalid? CPUID say that it is support even SSE4.1... Also on same host x86 build work, and on i7-4770 x64 build also work. So it is really possible something with CPU supported commands?
PS: What's default CEF requirements for target CPU?