github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.36k stars 1.47k forks source link

Extractor exiting with code 1 ("Warning[extractor-c++]: In index_expr_node: Unknown expr kind 30.") #16854

Closed flowerhack closed 1 week ago

flowerhack commented 3 weeks ago

Hi hello,

I'm a committer for the Chromium project & we've been experimenting with building CodeQL databases of Chromium.

Context

While building the Chromium CodeQL database, in addition to the previously-reported "catastrophic" errors ([1], [2]), we get many thousands of errors that, while they do not seem to cross the threshold to be logged as "catastrophic," they nonetheless cause the extractor to terminate with exit code 1 & lead to incomplete Chromium databases being created.

I've investigated these errors and have classed them into nine unique bug types. I intend to report all nine (this report is bug 4 of 9), with a reproducing test case for each.

The hope is that, if these bugs + the catastrophic errors are fixed, we will be able to have a complete build of a Chromium CodeQL database (barring, of course, the scenario where fixing these bugs serves to unmask new ones...!).

The Bug

When building the Chromium CodeQL database, we see ~120,000 errors of the following type:

Warning[extractor-c++]: In index_expr_node: Unknown expr kind 30.

Unfortunately the logs don't seem to point to a specific code point from which the error arose.

I'll note that I have a suspicion this may be due to CodeQL's current lack of C++20 support; Chromium has been adding C++20 features recently, and those commits correspond roughly with when these errors started occurring in great volume during our builds, and would explain why certain types of expressions are simply unknown.

If that is the case, I understand if C++20 support is not top-of-mind for your team, but we'd be curious to hear if adding that support is anywhere on your future roadmap.

Reproducing The Bug

I have created a standalone file which can be used to reproduce this bug, which is attached here as GrGlAttachment_ii.cpp.txt (please remove the .txt extension; this was to make the Github attachment uploader happy).

Reproduction steps (assumes that GrGlAttachment_ii.cpp is in /YOUR/ROOT/HERE; assumes Clang 19 (Chrome uses the latest upstream Clang, generally speaking); assumes Linux):

(1) codeql database init --language=cpp --source-root=/YOUR/ROOT/HERE/SOME-EMPTY-DIRECTORY /YOUR/ROOT/HERE/repro-bug1-db --overwrite

(2) codeql database trace-command /YOUR_ROOT_HERE/repro-bug1-db --working-dir=/YOUR/ROOT/HERE -- clang++ -DSK_CODEC_DECODES_JPEG_GAINMAPS -DSK_SHAPER_PRIMITIVE_AVAILABLE -DSKOTTIE_TRIVIAL_FONTRUN_ITER -DDCHECK_ALWAYS_ON=1 -DUSE_UDEV -DUSE_AURA=1 -DUSE_GLIB=1 -DUSE_OZONE=1 -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_GNU_SOURCE -D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_EXTENSIVE -DCR_CLANG_REVISION=\"llvmorg-19-init-14561-gecea8371-1\" -DCOMPONENT_BUILD -DCR_LIBCXX_REVISION=09b99fd8ab300c93ff7b8df6688cafb27bd3db28 -DCR_SYSROOT_KEY=20230611T210420Z-2 -D_DEBUG -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DSK_ENABLE_SKSL -DSK_UNTIL_CRBUG_1187654_IS_FIXED -DSK_WIN_FONTMGR_NO_SIMULATIONS -DSK_DISABLE_LEGACY_INIT_DECODERS -DSK_SLUG_DISABLE_LEGACY_DESERIALIZE -DSK_DISABLE_LEGACY_VULKAN_BACKENDSEMAPHORE -DSK_DISABLE_LEGACY_CREATE_CHARACTERIZATION -DSK_DISABLE_LEGACY_VULKAN_MUTABLE_TEXTURE_STATE -DSK_CODEC_DECODES_JPEG -DSK_ENCODE_JPEG -DSK_ENCODE_PNG -DSK_ENCODE_WEBP -DSKIA_DLL -DSKCMS_API=__attribute__\(\(visibility\(\"default\"\)\)\) -DSK_GANESH -DSK_GPU_WORKAROUNDS_HEADER=\"gpu/config/gpu_driver_bug_workaround_autogen.h\" -DSK_GL -DSK_VULKAN=1 -DSK_GRAPHITE -DSK_DAWN -DVK_USE_PLATFORM_XCB_KHR -DVK_USE_PLATFORM_WAYLAND_KHR -DIS_SKIA_IMPL=1 -DSKIA_IMPLEMENTATION=1 -DSK_FREETYPE_MINIMUM_RUNTIME_VERSION_IS_BUILD_VERSION -DSK_TYPEFACE_FACTORY_FREETYPE -DSK_FONTMGR_FREETYPE_EMPTY_AVAILABLE -DSK_GAMMA_EXPONENT=1.2 -DSK_GAMMA_CONTRAST=0.2 -DSK_DEFAULT_FONT_CACHE_LIMIT=20971520 -DGLIB_VERSION_MAX_ALLOWED=GLIB_VERSION_2_56 -DGLIB_VERSION_MIN_REQUIRED=GLIB_VERSION_2_56 -DWGPU_SHARED_LIBRARY -DABSL_CONSUME_DLL -DABSL_FLAGS_STRIP_NAMES=0 -DBORINGSSL_SHARED_LIBRARY -DWEBP_EXTERN=extern -DFT_CONFIG_MODULES_H=\"freetype-custom/freetype/config/ftmodule.h\" -DFT_CONFIG_OPTIONS_H=\"freetype-custom/freetype/config/ftoption.h\" -DPDFIUM_REQUIRED_MODULES -DUSE_LIBJPEG_TURBO=1 -DMANGLE_JPEG_NAMES -DU_USING_ICU_NAMESPACE=0 -DU_ENABLE_DYLOAD=0 -DUSE_CHROMIUM_ICU=1 -DU_ENABLE_TRACING=1 -DU_ENABLE_RESOURCE_TRACING=0 -DICU_UTIL_DATA_IMPL=ICU_UTIL_DATA_FILE -fno-delete-null-pointer-checks -fno-ident -fno-strict-aliasing -fstack-protector -funwind-tables -fPIC -pthread -fcolor-diagnostics -fmerge-all-constants -fno-sized-deallocation -mllvm -instcombine-lower-dbg-declare=0 -mllvm -split-threshold-for-reg-with-hint=0 -ffp-contract=off -fcomplete-member-pointers -m64 -msse3 -Wno-builtin-macro-redefined -D__DATE__= -D__TIME__= -D__TIMESTAMP__= -ffile-compilation-dir=. -no-canonical-prefixes -ftrivial-auto-var-init=pattern -O0 -fno-omit-frame-pointer -gdwarf-4 -g2 -gdwarf-aranges -gsplit-dwarf -ggnu-pubnames -fvisibility=hidden -Wheader-hygiene -Wstring-conversion -Wtautological-overlap-compare -DUNSAFE_BUFFERS_BUILD -Wno-redundant-parens -Wall -Wno-unused-variable -Wno-c++11-narrowing -Wno-unused-but-set-variable -Wno-misleading-indentation -Wno-missing-field-initializers -Wno-unused-parameter -Wno-psabi -Wloop-analysis -Wno-unneeded-internal-declaration -Wno-cast-function-type -Wno-ignored-pragma-optimize -Wno-deprecated-builtins -Wno-bitfield-constant-conversion -Wno-deprecated-this-capture -Wno-invalid-offsetof -Wno-vla-extension -Wno-thread-safety-reference-return -Werror -DPROTOBUF_ALLOW_DEPRECATED=1 -Wno-undefined-bool-conversion -Wno-tautological-undefined-compare -std=c++20 -Wno-trigraphs -gsimple-template-names -fno-exceptions -fno-rtti -nostdinc++ -fvisibility-inlines-hidden -Wenum-compare-conditional -Wno-c++11-narrowing-const-reference -Wno-missing-template-arg-list-after-template-kw -c ~/GrGlAttachment_ii.cpp -o ~/GrGLAttachment_ii.o

(3) codeql database finalize -j=-1 /YOUR/ROOT/HERE/repro-bug1-db.

At the conclusion of these steps there should be logs in build-tracer.log and logs/extractor/ indicating the failure.

In addition to (1) GrGlAttachment_ii.cpp.txt (the reproducer file), please find attached (2) the build-tracer.log and (3) the relevant extractor logfile (10d17.log) from running this on my own machine, which will hopefully be useful for debugging/triage.

I do have the logs for the entire Chromium build available upon request, but as you might imagine, those files are very large and may not be as useful to you as this standalone reproducer.

A fix for this bug (or, guidance on how we might be holding it wrong!) would be extremely helpful for us here in Chromium. Please let me know if you need any more information. Thank you!

10d17.log build-tracer.log GrGlAttachment_ii.cpp.txt

jketema commented 2 weeks ago

Hi,

Thanks for the report.

Warning[extractor-c++]: In index_expr_node: Unknown expr kind 30.

These are just warnings that can be safely ignored. It's effectively a symptom of us not extracting concepts. We know we have to fix this, but as concepts are not really relevant for detecting security issues this has not taken priority. If the extractor exits with exit code 1 here, is due to one of the parse errors from earlier in the log:

"../../third_party/libc++/src/include/__atomic/atomic_ref.h", line 108: error: expression must have a constant value
        __atomic_always_lock_free(sizeof(_Tp), reinterpret_cast<void*>(-required_alignment));
                                                                       ^

[E 02:01:07 95483] Warning[extractor-c++]: In construct_text_message: "../../third_party/libc++/src/include/__atomic/atomic_ref.h", line 108: error: expression must have a constant value
        __atomic_always_lock_free(sizeof(_Tp), reinterpret_cast<void*>(-required_alignment));
                                                                       ^

"../../third_party/libc++/src/include/__type_traits/is_trivially_relocatable.h", line 29: error: type name is not allowed
  struct __libcpp_is_trivially_relocatable : integral_constant<bool, __is_trivially_relocatable(_Tp)> {};
...

Assuming you have or will report those parse errors separately, I'd like to close this issue if that's ok with you.

flowerhack commented 1 week ago

Oh, that's good to know, thank you. Please feel free to close this issue.

(Also, for my edification: does "Unknown expr kind 30" refer to concepts specifically? Or do all log messages of the form "Unknown expr kind $SOME_NUMBER" correspond to concepts? I've seen a variety of errors of this sort (as well as "Unknown routine kind $NUMBER", "Unexpected dynamic init kind $NUMBER", and it'd be nice to know whether those are all the same thing, or if they're likely different things that may be worth reporting.)

flowerhack commented 1 week ago

In particular, the specific "unknown/unexpected" things I've been seeing are:

If any of these are already known to be ignorable, or known to be safe bugs, do let me know. If these are new to you I will plan to file bugs for them. Thanks!

jketema commented 1 week ago

They're due to a variety of reasons.

These are all related to concepts:

These are related to compiler generated initialisations, most (if not all) in compiler generated constructors, which we don't do anything with in queries, so not problematic:

These are a couple of cases where where we don't identify the routine type correctly (the routine will still end up in the database). I believe that some of these should be fixed with the latest CodeQL version.

This one is related to some synthetic attribute that our frontend generates, and we should silence it:

For the following could you open a new issue (a single one which mentions all three suffices). Just providing log output in which they occur should be enough (I know how to reproduce them).

jketema commented 1 week ago

Closing this as discussed above.

jketema commented 1 week ago

For the following could you open a new issue (a single one which mentions all three suffices). Just providing log output in which they occur should be enough (I know how to reproduce them).

  • Unrecognized builtin operation kind 60.
  • Unrecognized builtin operation kind 98.
  • Unrecognized builtin operation kind 102.

These 3 will be fixed in CodeQL 2.18.1. The public facing part of this is https://github.com/github/codeql/pull/16951.