flutter / flutter

Flutter makes it easy and fast to build beautiful apps for mobile and beyond
https://flutter.dev
BSD 3-Clause "New" or "Revised" License
162.28k stars 26.66k forks source link

`Mac mac_host_engine` flakey on clang crash #140458

Open gaaclarke opened 5 months ago

gaaclarke commented 5 months ago

The Mac mac_host_engine is flaking on a clang frontend crash. It's happened 2/10 last invocations.

Is this related to the GOMA migration? I don't think this is probably a problem we've created for ourselves but is some infra issue related to our clang version.

example

https://ci.chromium.org/ui/p/flutter/builders/prod/Mac%20mac_host_engine/8228/overview

error text

FAILED: obj/third_party/vulkan-deps/spirv-tools/src/source/opt/libspvtools_opt.value_number_table.o 
../../buildtools/mac-x64/clang/bin/clang++ -MD -MF obj/third_party/vulkan-deps/spirv-tools/src/source/opt/libspvtools_opt.value_number_table.o.d -DUSE_OPENSSL=1 -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D_FORTIFY_SOURCE=2 -D_LIBCPP_DISABLE_AVAILABILITY=1 -D_LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS -D_LIBCPP_ENABLE_THREAD_SAFETY_ANNOTATIONS -DNDEBUG -DNVALGRIND -DDYNAMIC_ANNOTATIONS_ENABLED=0 -I../.. -Igen -I../../third_party/libcxx/include -I../../third_party/libcxxabi/include -I../../flutter/build/secondary/third_party/libcxx/config -I../../third_party/vulkan-deps/spirv-tools/src -I../../third_party/vulkan-deps/spirv-headers/src/include -I../../third_party/vulkan-deps/spirv-tools/src/include -Igen/third_party/vulkan-deps/spirv-tools/src -isysroot /Volumes/Work/s/w/ir/cache/osx_sdk/xcode_14e300c/XCode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk -mmacosx-version-min=10.14.0 -fno-strict-aliasing -fstack-protector-all -arch x86_64 -fcolor-diagnostics -Wall -Wextra -Wendif-labels -Werror -Wno-missing-field-initializers -Wno-unused-parameter -Wno-unused-but-set-parameter -Wno-unused-but-set-variable -Wno-implicit-int-float-conversion -Wno-deprecated-copy -Wno-psabi -Wno-deprecated-literal-operator -Wno-unqualified-std-cast-call -Wno-non-c-typedef-for-linkage -Wno-range-loop-construct -Wunguarded-availability -Wno-deprecated-declarations -fvisibility=hidden -Wstring-conversion -Wnewline-eof -O2 -fno-ident -fdata-sections -ffunction-sections -g2 -Wno-implicit-fallthrough -Wno-newline-eof -Wno-unreachable-code-break -Wno-unreachable-code-return -std=c++17 -fvisibility-inlines-hidden -std=c++17 -fno-rtti -nostdinc++ -nostdinc++ -fvisibility=hidden -fno-exceptions -stdlib=libc++  -c ../../third_party/vulkan-deps/spirv-tools/src/source/opt/value_number_table.cpp -o obj/third_party/vulkan-deps/spirv-tools/src/source/opt/libspvtools_opt.value_number_table.o
clang++: error: clang frontend command failed with exit code 139 (use -v to see invocation)
Fuchsia clang version 18.0.0 (https://llvm.googlesource.com/llvm-project 725656bdd885483c39f482a01ea25d67acf39c46)
Target: x86_64-apple-darwin22.6.0
Thread model: posix
InstalledDir: /Volumes/Work/s/w/ir/cache/builder/src/out/host_release/../../buildtools/mac-x64/clang/bin
clang++: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang++: note: diagnostic msg: /Volumes/Work/s/w/ir/x/t/value_number_table-bbb3c5.cpp
clang++: note: diagnostic msg: /Volumes/Work/s/w/ir/x/t/value_number_table-bbb3c5.sh
clang++: note: diagnostic msg: Crash backtrace is located in
clang++: note: diagnostic msg: /Users/chrome-bot/Library/Logs/DiagnosticReports/llvm_<YYYY-MM-DD-HHMMSS>_<hostname>.crash
clang++: note: diagnostic msg: (choose the .crash file that corresponds to your crash)
clang++: note: diagnostic msg: 

********************
gaaclarke commented 5 months ago

https://github.com/flutter/engine/pull/49248 landed yesterday. I'm not sure if that has addressed the issue though.

zanderso commented 5 months ago

Routing to team-infra since we won't be able to file an issue with the toolchain team without the crash dump and reduced repro mentioned in the error message. Assigning to @keyonghan since he was looking into this yesterday.

keyonghan commented 5 months ago

I will try adding some recipes logics to collect the crash repro files, and with that we can follow up with toolchain team.

keyonghan commented 4 months ago

https://flutter-review.googlesource.com/c/recipes/+/53421 to collect the dump files when crash happens.

keyonghan commented 3 months ago

Un-assigning myself considering the infra support is available.

keyonghan commented 2 months ago

https://flutter-review.googlesource.com/c/recipes/+/55320 to collect clang crash dump for rbe builds.

keyonghan commented 1 month ago

Engine tree is red on a new flake: https://ci.chromium.org/ui/p/flutter/builders/prod/Mac%20Production%20Engine%20Drone/311038/infra, with crash dump files attached.

zanderso commented 1 month ago

Thanks! Would it be possible to include this file mentioned in the logs as well?

clang++: note: diagnostic msg: /Users/chrome-bot/Library/Logs/DiagnosticReports/llvm_<YYYY-MM-DD-HHMMSS>_<hostname>.crash
clang++: note: diagnostic msg: (choose the .crash file that corresponds to your crash)

Unfortunately, I think the crash might be flaky. I've downloaded the reproducer and the driver script, but it doesn't crash when I run it locally. I have it executing in a tight loop now, and hopefully it will crash at some point.

keyonghan commented 1 month ago

Yeah, I can do that. Will update here when ready. Assigning to myself for the infra support.

keyonghan commented 1 month ago

Bumping the priority as it flaked >15 times in recent 100 runs: https://ci.chromium.org/ui/p/flutter/builders/luci.flutter.prod/Mac%20mac_host_engine?limit=100

gaaclarke commented 1 month ago

I mentioned this in another thread, but I have a speculative fix for this. The clang crash is happening when cross compiling on an x64 machine for arm64. I hypothesize that moving that bot to arm64 macs, so it isn't cross-compiling, may work around the issue.

zanderso commented 3 weeks ago

All crashes have been happening only on build797-m9 and not crashing on the other x86 bots, so it seems like this is an issue with that particular bot.

gaaclarke commented 1 week ago

I just saw this show up on build804-m9 ( https://ci.chromium.org/ui/p/flutter/builders/prod/Mac%20Production%20Engine%20Drone/332573/overview )