github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.51k stars 1.49k forks source link

No entity set found for seq 0 #16996

Closed flowerhack closed 2 weeks ago

flowerhack commented 1 month ago

Hi hello,

I'm a committer for the Chromium project & we've been experimenting with building CodeQL databases of Chromium.

Context

While building the Chromium CodeQL database, in addition to the previously-reported "catastrophic" errors ([1], [2]), we get many thousands of errors that, while they do not seem to cross the threshold to be logged as "catastrophic," they nonetheless cause the extractor to terminate with exit code 1 & lead to incomplete Chromium databases being created.

I've investigated these errors and have classed them into unique bug types as best as I am able.

The hope is that, if these bugs + the catastrophic errors are fixed, we will be able to have a complete build of a Chromium CodeQL database (barring, of course, the scenario where fixing these bugs serves to unmask new ones...!).

The Bug

When building the Chromium CodeQL database, we see ~23,000 errors of the following type:

Warning[extractor-c++]: In get_entity_set_for_seq: No entity set found for seq 0

Reproducing The Bug

I have created a standalone file which can be used to reproduce this bug, which is attached here as unicode_utilities_ii.cc.txt (please remove the .txt extension; this was to make the Github attachment uploader happy).

Reproduction steps (assumes that unicode_utilities_ii.cc is in /YOUR/ROOT/HERE; assumes Clang 19 (Chrome uses the latest upstream Clang, generally speaking); assumes Linux):

(1) `codeql database init --language=cpp --source-root=/YOUR/ROOT/HERE/SOME-EMPTY-DIRECTORY /YOUR/ROOT/HERE/repro_entity_set --overwrite

(2) codeql database trace-command /YOUR_ROOT_HERE/repro_entity_set --working-dir=/YOUR/ROOT/HERE -- clang++ -DDCHECK_ALWAYS_ON=1 -DUSE_UDEV -DUSE_AURA=1 -DUSE_GLIB=1 -DUSE_OZONE=1 -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_GNU_SOURCE -D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_EXTENSIVE '-DCR_CLANG_REVISION="llvmorg-19-init-10646-g084e2b53-57"' -DCOMPONENT_BUILD -DCR_LIBCXX_REVISION=852bc6746f45add53fec19f3a29280e69e358d44 -DTEMP_REBUILD_HACK_ASSERTION_HANDLER -DCR_SYSROOT_KEY=20230611T210420Z-2 -D_DEBUG -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DBLINK_IMPLEMENTATION=1 -DINSIDE_BLINK -DBLINK_PLATFORM_IMPLEMENTATION=1 -DGLIB_VERSION_MAX_ALLOWED=GLIB_VERSION_2_56 -DGLIB_VERSION_MIN_REQUIRED=GLIB_VERSION_2_56 -DSK_ENABLE_SKSL -DSK_UNTIL_CRBUG_1187654_IS_FIXED -DSK_WIN_FONTMGR_NO_SIMULATIONS -DSK_DISABLE_LEGACY_INIT_DECODERS -DSK_SLUG_DISABLE_LEGACY_DESERIALIZE -DSK_DISABLE_LEGACY_VULKAN_BACKENDSEMAPHORE -DSK_DISABLE_LEGACY_CREATE_CHARACTERIZATION -DSK_DISABLE_LEGACY_VULKAN_MUTABLE_TEXTURE_STATE -DSK_CODEC_DECODES_JPEG -DSK_ENCODE_JPEG -DSK_ENCODE_PNG -DSK_ENCODE_WEBP -DSKIA_DLL '-DSKCMS_API=__attribute__((visibility("default")))' -DSK_GANESH '-DSK_GPU_WORKAROUNDS_HEADER="gpu/config/gpu_driver_bug_workaround_autogen.h"' -DSK_GL -DSK_VULKAN=1 -DSK_GRAPHITE -DSK_DAWN -DVK_USE_PLATFORM_XCB_KHR -DVK_USE_PLATFORM_WAYLAND_KHR -DUSE_EGL -DLIBYUV_DISABLE_NEON -DLIBYUV_DISABLE_LSX -DLIBYUV_DISABLE_LASX -DWTF_USE_WEBAUDIO_PFFFT=1 -DABSL_CONSUME_DLL -DABSL_FLAGS_STRIP_NAMES=0 -DBORINGSSL_SHARED_LIBRARY -DWGPU_SHARED_LIBRARY -DU_USING_ICU_NAMESPACE=0 -DU_ENABLE_DYLOAD=0 -DUSE_CHROMIUM_ICU=1 -DU_ENABLE_TRACING=1 -DU_ENABLE_RESOURCE_TRACING=0 -DICU_UTIL_DATA_IMPL=ICU_UTIL_DATA_FILE -DCRASHPAD_ZLIB_SOURCE_EXTERNAL -DGOOGLE_PROTOBUF_NO_RTTI -DGOOGLE_PROTOBUF_NO_STATIC_INITIALIZER -DGOOGLE_PROTOBUF_INTERNAL_DONATE_STEAL_INLINE=0 -DHAVE_PTHREAD -DPROTOBUF_USE_DLLS -DUSING_V8_SHARED -DUSING_V8_SHARED_PRIVATE -DV8_ENABLE_CHECKS -DV8_COMPRESS_POINTERS -DV8_COMPRESS_POINTERS_IN_SHARED_CAGE -DV8_31BIT_SMIS_ON_64BIT_ARCH -DV8_ENABLE_SANDBOX -DV8_DEPRECATION_WARNINGS -DV8_USE_PERFETTO -DV8_HAVE_TARGET_OS -DV8_TARGET_OS_LINUX -DCPPGC_CAGED_HEAP -DCPPGC_YOUNG_GENERATION -DCPPGC_POINTER_COMPRESSION -DCPPGC_ENABLE_LARGER_CAGE -DCPPGC_SLIM_WRITE_BARRIER -DWEBRTC_ENABLE_SYMBOL_EXPORT -DWEBRTC_ENABLE_AVX2 -DWEBRTC_CHROMIUM_BUILD -DWEBRTC_POSIX -DWEBRTC_LINUX -DABSL_ALLOCATOR_NOTHROW=1 -DWEBRTC_USE_X11 -DWEBRTC_USE_PIPEWIRE -DWEBRTC_USE_GIO -DLOGGING_INSIDE_WEBRTC -DLEVELDB_PLATFORM_CHROMIUM=1 -DLEVELDB_SHARED_LIBRARY -DUSE_LIBJPEG_TURBO=1 -DMANGLE_JPEG_NAMES -DWEBP_EXTERN=extern -DUSING_V8_BASE_SHARED -DUSING_V8_PLATFORM_SHARED '-DFT_CONFIG_MODULES_H="freetype-custom/freetype/config/ftmodule.h"' '-DFT_CONFIG_OPTIONS_H="freetype-custom/freetype/config/ftoption.h"' -DPDFIUM_REQUIRED_MODULES -DLIBGAV1_MAX_BITDEPTH=10 -DLIBGAV1_THREADPOOL_USE_STD_MUTEX -DLIBGAV1_ENABLE_LOGGING=0 -DLIBGAV1_PUBLIC= -DCHROMIUM -DDAWN_WIRE_SHARED_LIBRARY -Wall -Wextra -Wimplicit-fallthrough -Wextra-semi -Wunreachable-code-aggressive -Wthread-safety -Wno-missing-field-initializers -Wno-unused-parameter -Wno-psabi -Wloop-analysis -Wno-unneeded-internal-declaration -Wno-cast-function-type -Wno-ignored-pragma-optimize -Wno-deprecated-builtins -Wno-bitfield-constant-conversion -Wno-deprecated-this-capture -Wno-invalid-offsetof -Wno-vla-extension -Wno-thread-safety-reference-return -Wshadow -Werror -fno-delete-null-pointer-checks -fno-ident -fno-strict-aliasing -fstack-protector -funwind-tables -fPIC -pthread -fcolor-diagnostics -fmerge-all-constants -fno-sized-deallocation -mllvm -instcombine-lower-dbg-declare=0 -mllvm -split-threshold-for-reg-with-hint=0 -ffp-contract=off -fcomplete-member-pointers -m64 -msse3 -Wno-builtin-macro-redefined -D__DATE__= -D__TIME__= -D__TIMESTAMP__= -ffile-compilation-dir=. -no-canonical-prefixes -ftrivial-auto-var-init=pattern -O0 -fno-omit-frame-pointer -fvisibility=hidden -Wheader-hygiene -Wstring-conversion -Wtautological-overlap-compare -DUNSAFE_BUFFERS_BUILD -Wconversion -Wno-float-conversion -Wno-sign-conversion -Wno-implicit-float-conversion -Wno-implicit-int-conversion -Wexit-time-destructors -Wglobal-constructors -gdwarf-4 -g2 -gdwarf-aranges -gsplit-dwarf -ggnu-pubnames -Wno-redundant-parens -Wno-redundant-parens -DPROTOBUF_ALLOW_DEPRECATED=1 -Wenum-compare-conditional '-Wno-c++11-narrowing-const-reference' -Wno-undefined-bool-conversion -Wno-tautological-undefined-compare '-std=c++20' -Wno-trigraphs -gsimple-template-names -fno-exceptions -fno-rtti -Wno-tautological-compare '-nostdinc++' -fvisibility-inlines-hidden -c ~/unicode_utilities_ii.cc -o ~/unicode_utilities.o

(3) codeql database finalize -j=-1 /YOUR/ROOT/HERE/repro_entity_set.

At the conclusion of these steps there should be logs in build-tracer.log and logs/extractor/ indicating the failure.

In addition to (1) unicode_utilities_ii.cc.txt (the reproducer file), please find attached (2) the build-tracer.log and (3) the relevant extractor logfile (23bd9.log) from running this on my own machine, which will hopefully be useful for debugging/triage.

I do have the logs for the entire Chromium build available upon request, but as you might imagine, those files are very large and may not be as useful to you as this standalone reproducer.

A fix for this bug (or, guidance on how we might be holding it wrong!) would be extremely helpful for us here in Chromium. Please let me know if you need any more information. Thank you!

23bd9.log build-tracer.log unicode_utilities_ii.cc.txt

jketema commented 1 month ago

Hi @flowerhack,

Thanks for the report. We'll investigate.

jketema commented 1 month ago

This is due to some of clang's builtin operations not being supported by our frontend (specifically __is_trivially_relocatable, __datasizeof, __is_trivially_equality_comparable, and __is_nothrow_convertible). I've raised this with our frontend supplier.

jketema commented 2 weeks ago

The builtin operations will be supported from CodeQL 2.19.0 onwards, which should be released in about a month's time.