github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.34k stars 1.47k forks source link

Not able to use config file for database creation command #13524

Open HarshadDGhorpade-eaton opened 1 year ago

HarshadDGhorpade-eaton commented 1 year ago

Our build process comprises of 4-5 commands so trying to use config file and use it in command but getting error as "Invalid property specified in the configuration file. Ignoring it and proceeding"

as per using-a-codeql-configuration-file, does config file gets used internally by codeql database create without specifying --codescanning-config option ? Some internet sources talks about yaml-based config file as well and using it with --codescanning-config option, can you please clarify what's correct way to use config file?

I am trying this way :

codeql database create --language=cpp --github-url=https://github.com/ --codescanning-config=../codeql-config.yml --source-root . db

where codeql-config.yml file contents are like below :

name: My CodeQL Configuration
language: cpp
build:
  - "./setup.sh <arguments>"
  - "bash -c "command""
  - "./setup2.sh"<arguments>"
  - "bash -c "command""

getting below error :

Invalid property specified in the configuration file. Ignoring it and proceeding.
A fatal error occurred: Query pack codeql/cpp-queries cannot be found. Check the spelling of the pack.

Specifying multiple commands works but that becomes not maintainable as commands are lengthy : codeql database create --command "cmd1" --command "cmd2" --command "cmd3" --command "cmd4" --language=cpp --github-url=https://github.com/ --source-root . db

aibaars commented 1 year ago

Have you tried using command: as the property name instead of build: ?

HarshadDGhorpade-eaton commented 1 year ago

Have you tried using command: as the property name instead of build: ?

still the same error with command.

aibaars commented 1 year ago

Could you try removing the name: property?

HarshadDGhorpade-eaton commented 1 year ago

Could you try removing the name: property?

Doesnt work, tried keeping only command/build, still same.

aibaars commented 1 year ago

Sorry, I think the documentation for the codescanning config file is the following https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/customizing-code-scanning#using-a-custom-configuration-file . I don't see any mention there for a command or build property.

The documentation you were referring to before can be used to provide default values for command line arguments.

I think neither is really suitable for your use-case. The easiest is probably to put the commands in a single shell script (for example build.sh) and run codeql database create --language cpp --command ./build.sh ....

Note that CodeQL may automatically recognize build.sh as a build script, so things may even work if you leave out --command ./build.sh.

HarshadDGhorpade-eaton commented 1 year ago

Sorry, I think the documentation for the codescanning config file is the following https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/customizing-code-scanning#using-a-custom-configuration-file . I don't see any mention there for a command or build property.

The documentation you were referring to before can be used to provide default values for command line arguments.

I think neither is really suitable for your use-case. The easiest is probably to put the commands in a single shell script (for example build.sh) and run codeql database create --language cpp --command ./build.sh ....

Note that CodeQL may automatically recognize build.sh as a build script, so things may even work if you leave out --command ./build.sh.

Thanks alot for this, it worked this way, sadly it's not mentioned in the doc anywhere but your replies were faster that solved issue quickly.

HarshadDGhorpade-eaton commented 1 year ago

while trying this out we're facing another issue :

ERROR: ld.so: object '/mnt/work/codeql/tools/linux64/lib64trace.so' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS64): ignored.
......
ERROR: ld.so: object '/mnt/work/codeql/tools/linux64/lib64trace.so' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS64): ignored.

not getting what's issue here, this is causing failures in codeql database create command step in azure pipeline. is this error harmless as written here ? not exactly the same error though.

I tried proceeding further with codeql database analyze but its saying "generated db needs to be finalized before running queries; please run codeql database finalize"

do I need to add codeql database finalize ?

aibaars commented 1 year ago

while trying this out we're facing another issue :

ERROR: ld.so: object '/mnt/work/codeql/tools/linux64/lib64trace.so' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS64): ignored.
......
ERROR: ld.so: object '/mnt/work/codeql/tools/linux64/lib64trace.so' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS64): ignored.

not getting what's issue here, this is causing failures in codeql database create command step in azure pipeline. is this error harmless as written here ? not exactly the same error though.

Those error could be harmless if they only happen on processes the CodeQL analyser does not care about. In the linked example of a similar error, CodeQL was running inside a docker container and the error was reported on things running on the host machine.

In your case the errors suggest that CodeQL may be wrongly setting the LD_PRELOAD variable to the 64bit library for 32 bit processes. CodeQL needs to be able to "see" all compiler processes to figure out how to analyse the source code. So if the 32bit processes that are ignored happen to be the build scripts and compiler processes then CodeQL won't "see" a thing, and you end up with an empty database.

I'm a little surprised by the LD_PRELOAD value, I think it normally looks like /mnt/work/codeql/tools/linux64/${LIB}trace.so, and ld.so expands ${LIB} to the suitable value depending on whether the process is 32 or 64 bits. Could you validate that the LIB placeholder is not accidentally interpreted/replaced by your azure pipeline scripts?

I tried proceeding further with codeql database analyze but its saying "generated db needs to be finalized before running queries; please run codeql database finalize"

do I need to add codeql database finalize ?

No, normally codeql database create will run codeql database finalize automatically, except when database creation failed in an earlier step. You could try running codeql database finalize but even if it doesn't fail completely, you still end up with a partial database.

HarshadDGhorpade-eaton commented 1 year ago

Could you validate that the LIB placeholder is not accidentally interpreted/replaced by your azure pipeline scripts?

I am not sure what and how to check that, can you please elaborate on this ?

aibaars commented 1 year ago

Could you validate that the LIB placeholder is not accidentally interpreted/replaced by your azure pipeline scripts?

I am not sure what and how to check that, can you please elaborate on this ?

Could you try running printenv (or another command that prints the environment such as set or export) in your build script and look for the value of LD_PRELOAD?

Does your azure pipeline run a simple codeql database create command, or does it try to do more fancy things by setting special environment variables or using features like indirect build tracing ? If you're using a simple codeql database create then things should just work.

Could you also check which operating system and version is running on the azure devops workers? Do they run in docker or is some kind of virtualization or WSL in use? Perhaps running in a container may somehow confuse the code that detects whether a binary is 32 or 64bit .

HarshadDGhorpade-eaton commented 1 year ago

I dont see LD_PRELOAD environment variable after doing. Earlier I was printing env vars after the db creation command in a separate step so getting nothing, when printing same in build.sh, I do find LD_PRELOAD=/mnt/work/codeql/tools/linux64/lib64trace.so

database creation command : codeql database create --language cpp --github-url=https://github.com/ --command ./build.sh --source-root . db

OS details : Distributor ID: Ubuntu Description: Ubuntu 20.04.6 LTS Release: 20.04 x86_64

We're running on Azure cloud agents, nothing on container.

HarshadDGhorpade-eaton commented 1 year ago

database-create-20230622.152801.058.log Uploading database create log just in case if it helps.

aibaars commented 1 year ago

separate step so getting nothing, when printing same in build.sh, I do find LD_PRELOAD=/mnt/work/codeql/tools/linux64/lib64trace.so

Thanks! Could you attach a list of all environment variables containing any of the words PRELOAD, SEMMLE, ODASA, and CODEQL?

Could you also attach the build-tracer.log file to this issue?

HarshadDGhorpade-eaton commented 1 year ago

Was trying to attach build-tracer.log but its more than 2GB.. zipped version goes ~180MB.. can attached only until 25 MB here.

PRELOAD --> SEMMLE_PRELOADlibtrace=/mnt/work/codeql/tools/linux64/${LIB}${PLATFORM}_trace.so SEMMLE_PRELOAD_libtrace32=/mnt/work/codeql/tools/linux64/lib32trace.so SEMMLE_PRELOAD_libtrace64=/mnt/work/codeql/tools/linux64/lib64trace.so LD_PRELOAD=/mnt/work/codeql/tools/linux64/lib64trace.so

SEMMLE --> SEMMLE_PRELOADlibtrace=/mnt/work/codeql/tools/linux64/${LIB}${PLATFORM}_trace.so SEMMLE_PRELOAD_libtrace32=/mnt/work/codeql/tools/linux64/lib32trace.so SEMMLE_PRELOAD_libtrace64=/mnt/work/codeql/tools/linux64/lib64trace.so SEMMLE_EXEC=

ODASA --> nothing

CODEQL --> CODEQL_EXTRACTOR_CPP_TRAP_DIR=/mnt/work/1/s/edge-linux-yocto/yocto_db/trap/cpp CODEQL_TRACER_DIAGNOSTICS_DIR=/mnt/work/1/s/edge-linux-yocto/yocto_db/diagnostic/tracer CODEQL_EXTRACTOR_CPP_LOG_DIR=/mnt/work/1/s/edge-linux-yocto/yocto_db/log CODEQL_EXTRACTOR_CPP_SOURCE_ARCHIVE_DIR=/mnt/work/1/s/edge-linux-yocto/yocto_db/src CODEQL_PLATFORM_DLL_EXTENSION=.so CODEQL_EXTRACTOR_CPP_DIAGNOSTIC_DIR=/mnt/work/1/s/edge-linux-yocto/yocto_db/diagnostic/extractors/cpp CODEQL_EXTRACTOR_CPP_WIP_DATABASE=/mnt/work/1/s/edge-linux-yocto/yocto_db CODEQL_JAVA_HOME=/mnt/work/codeql/tools/linux64/java CODEQL_EXTRACTOR_CPP_SCRATCH_DIR=/mnt/work/1/s/edge-linux-yocto/yocto_db/working CODEQL_DIST=/mnt/work/codeql CODEQL_PLATFORM=linux64 CODEQL_SCRATCH_DIR=/mnt/work/1/s/edge-linux-yocto/yocto_db/working CODEQL_TRACER_LANGUAGES=cpp CODEQL_TRACER_LOG=/mnt/work/1/s/edge-linux-yocto/yocto_db/log/build-tracer.log CODEQL_EXTRACTOR_CPP_ROOT=/mnt/work/codeql/cpp CODEQL_PARENT_ID=0000000000001561_0000000000000003 CODEQL_EXEC_ARGS_OFFSET=

aibaars commented 1 year ago

Was trying to attach build-tracer.log but its more than 2GB.. zipped version goes ~180MB.. can attached only until 25 MB

Ah indeed, the tracer log can be very large. Could you search for "interesting" fragments of the tracer log. I think the first 1000 lines are interesting, and any blocks of text ending with a Catastrophic error messages. If there are many errors then just select a sample of a few of them.

You can also create an enterprise support ticket and use the upload large files functionality.

HarshadDGhorpade-eaton commented 1 year ago

build-tracer-lines_0_1200-and-catastrophic-error.log

Adding some chunks from original build-tracer.log... I do see the same pattern repeated for "Catastrophic error" complaining about not able to open file.

aibaars commented 1 year ago

A team member mentioned: The tracer is trying to sniff out which type 32bit or 64bit a binary is, and insert the correct library for that, and only falls back to the generic LIB expansion in case it doesn't manage to do that. Maybe we're hitting a weird special case here that is confusing the detection logic? There's log messages for that, but they are not be enabled at the default log level,  The string is detected as: , and the logging is enabled with setting the environment variable SEMMLE_DEBUG_TRACER  to 6.

The log will be even larger. One way to reduce the log size would be to build only a smaller part of the code that still exhibits the same problem.

To make sense of the log, we'd need to correlate the detected filetype from the log for a binary with the actual filetype of the binary that's emitting those error messages, and I don't see the name of that anywhere in the issue. Do you know which process is printing the LD_PRELOAD related error messages?

aibaars commented 1 year ago

Adding some chunks from original build-tracer.log... I do see the same pattern repeated for "Catastrophic error" complaining about not able to open file.

Yes indeed. The good news is that CodeQL seems to be able to intercept compiler calls. The error messages are a bit unexpected, but the sampled ones all look like part of the "configure" phase of the build. Could you look for a few samples of Catastrophic error messages that mention source files from the repository you'd like to analyse?

aibaars commented 1 year ago

@HarshadDGhorpade-eaton , looking at the database create log file, I realised that the error is happening very near the end (task 6330 of 6338 ).

[2023-06-22 16:30:50] [build-stdout] NOTE: Running noexec task 6330 of 6338 (/mnt/work/3/s/edge-linux-yocto/meta-pxred/meta-bsp-stm32mp1/recipes-kernel/linux/linux-stm32mp-ipl.bb:do_build)
[2023-06-22 16:31:51] [build-stderr] ERROR: px-red-image-1.0-4r6 do_rootfs: [log_check] px-red-image: found 2 error messages in the logfile:
[2023-06-22 16:31:51] [build-stderr] [log_check] ERROR: ld.so: object '/mnt/work/codeql/tools/linux64/lib64trace.so' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS64): ignored.
[2023-06-22 16:31:51] [build-stdout] NOTE: recipe px-red-image-1.0-4r6: task do_rootfs: Failed

It is very likely that all steps of the build that are of interest to CodeQL (compiling and linking) had already succeeded. You could try to add || true to your build command to make it always succeed. It's not pretty, but it would make the build "succeed" after 6330 tasks. With a bit of luck the remaining 8 tasks are not interesting. It's quite likely that they are related to packaging the generated build artefacts or so. My only worry is that some of those remaining 8 tasks are related to linking binaries and library artefacts. CodeQL normally works fine if we miss out on those, however, for large and complex builds the linker information may be needed for disambiguation of (function) names. Without this you may get occasional confusing results when CodeQL mixes up functions with the same names defined in completely unrelated components.

The build seems to fail because log_check detects those ERROR messages in the log, so if there is a way to tell log_check that LD_PRELOAD related errors are "fine" then you should have a short term workaround that is a bit more reliable than adding || true ;-)

HarshadDGhorpade-eaton commented 1 year ago

You could try to add || true to your build command to make it always succeed. It's not pretty, but it would make the build "succeed" after 6330 tasks. With a bit of luck the remaining 8 tasks are not interesting. It's quite likely that they are related to packaging the generated build artefacts or so. My only worry is that some of those remaining 8 tasks are related to linking binaries and library artefacts. CodeQL normally works fine if we miss out on those, however, for large and complex builds the linker information may be needed for disambiguation of (function) names. Without this you may get occasional confusing results when CodeQL mixes up functions with the same names defined in completely unrelated components.

yes, you're right.. noticing its failing in last stages I tried proceeding further with codeql database analyze but its saying "generated db needs to be finalized before running queries; please run codeql database finalize"

do I need to add codeql database finalize ?

https://github.com/github/codeql/issues/13524#issuecomment-1602282964

aibaars commented 1 year ago

do I need to add codeql database finalize ?

That should work too in this case. I'd normally avoid carrying on after codeql database create fails, but in this case it fails so close to the end that it is probably fine. Also note that you cannot run codeql database finalize after a successful run of codeql database create.

Under the hood the codeql database create command runs codeql database init, codeql database trace-command and codeql database finalize.

HarshadDGhorpade-eaton commented 1 year ago

okay, latest build gone past this and now saying :

Running queries.
A fatal error occurred: Query pack security-extended cannot be found. Check the spelling of the pack.

command : codeql database analyze --format=sarif-latest --output=./temp/results-cpp.sarif db security-extended

I can't pass suite name security-extended here ? we do have github action for other project which is using this :

    - name: Initialize CodeQL
      uses: github/codeql-action/init@v2
      with:
        languages: ${{ matrix.language }}
        queries: security-extended
HarshadDGhorpade-eaton commented 1 year ago

codeql database analyze codeql/cpp-queries:codeql-suites/cpp-code-scanning.qls --format=sarifv2.1.0 --output=cpp-results.sarif --download

is this the correct way ?

aibaars commented 1 year ago

I can't pass suite name security-extended here ? we do have github action for other project which is using this :

The name of the query suite is actually cpp-security-extended. The github action internally prefixes the security-extended name with the identifier of the language.

aibaars commented 1 year ago

The cpp-code-scanning.qls file corresponds to the code-scanning query suite in the github action. It is fine to use, but if you want security-extended then you need to run codeql/cpp-queries:codeql-suites/cpp-security-extended.qls (or cpp-security-extended for short).

HarshadDGhorpade-eaton commented 1 year ago

okay, we're now able to generate database, analyze it and upload results to github repo, thanks for the apt response from your side, appreciate it.

I have shared the logs zip(containing tracer log and db creation logs) in a github repo setup by your colleauge.

we will have to find a way to get rid of this "LD_PRELOAD" error, for now its okay to continue despite error knowing its not affecting the data codeql needed but this will allow real errors to go through as well.