v2.15.3 - Python code analysis fails

akr-amd commented 10 months ago

Hi there, I am trying to setup CodeQL analysis on a repo in our github enterprise server. This is a monorepo with TypeScript and Python code. The directory structure is like so

Repo
├── engine <- Python
├── cli <- Python
└── ui <- TypeScript

While analyzing the Python code, the CodeQL action fails with below error. Could you please help me figure out what I might be doing wrong?

2023-12-15T20:53:40.9634252Z ##[debug]Artifact debug-artifacts-python has been successfully uploaded, total size in bytes: 282039
2023-12-15T20:53:40.9635669Z Artifact has been finalized. All files have been successfully uploaded!
2023-12-15T20:53:40.9639065Z 
2023-12-15T20:53:40.9639865Z The raw size of all the files that were specified for upload is 282039 bytes
2023-12-15T20:53:40.9641501Z The size of all the files that were uploaded is 18399 bytes. This takes into account any gzip compression used to reduce the upload size, time and storage
2023-12-15T20:53:40.9642371Z 
2023-12-15T20:53:40.9643575Z Note: The size of downloaded zips can differ significantly from the reported size. For more information see: https://github.com/actions/upload-artifact#zipped-artifact-downloads 
2023-12-15T20:53:40.9644542Z 
2023-12-15T20:53:40.9646099Z ::group::CodeQL Debug Logs - python - database-trace-command-20231215.205329.488.log from file at path /__w/_temp/codeql_databases/python/log/database-trace-command-20231215.205329.488.log
2023-12-15T20:53:40.9648212Z ##[group]CodeQL Debug Logs - python - database-trace-command-20231215.205329.488.log from file at path /__w/_temp/codeql_databases/python/log/database-trace-command-20231215.205329.488.log
2023-12-15T20:53:40.9649754Z [2023-12-15 20:53:29] This is codeql database trace-command --index-traceless-dbs /__w/_temp/codeql_databases/python
2023-12-15T20:53:40.9650693Z [2023-12-15 20:53:29] Log file was started late.
2023-12-15T20:53:40.9651674Z [2023-12-15 20:53:29] Using autobuild script /__w/_tool/CodeQL/2.15.3/x64/codeql/python/tools/autobuild.sh.
2023-12-15T20:53:40.9652989Z [2023-12-15 20:53:29] [PROGRESS] database trace-command> Running command in /__w/nila/nila: [/__w/_tool/CodeQL/2.15.3/x64/codeql/python/tools/autobuild.sh]
2023-12-15T20:53:40.9654275Z [2023-12-15 20:53:29] [build-stderr] /bin/sh: 1: python2: not found
2023-12-15T20:53:40.9655462Z [2023-12-15 20:53:29] [build-stdout] No directories containing root identifiers were found. Returning working directory as root.
2023-12-15T20:53:40.9656656Z [2023-12-15 20:53:29] [build-stdout] Will try to guess Python version, as it was not specified in `lgtm.yml`
2023-12-15T20:53:40.9657791Z [2023-12-15 20:53:29] [build-stdout] Trying to guess Python version based on Trove classifiers in setup.py
2023-12-15T20:53:40.9658896Z [2023-12-15 20:53:29] [build-stdout] Did not find setup.py (expected it to be at /__w/nila/nila/setup.py)
2023-12-15T20:53:40.9659906Z [2023-12-15 20:53:29] [build-stdout] Trying to guess Python version based on travis file
2023-12-15T20:53:40.9661145Z [2023-12-15 20:53:29] [build-stdout] Did not find any travis files (expected them at either ['/__w/nila/nila/.travis.yml', '/__w/nila/nila/travis.yml'])
2023-12-15T20:53:40.9662300Z [2023-12-15 20:53:29] [build-stdout] Trying to guess Python version based on installed versions
2023-12-15T20:53:40.9663400Z [2023-12-15 20:53:29] [build-stdout] Wanted to run Python 2, but it is not available. Using Python 3 instead
2023-12-15T20:53:40.9664755Z [2023-12-15 20:53:29] [build-stdout] This script is running Python 3, but Python 2 is also available (as 'python3')
2023-12-15T20:53:40.9666131Z [2023-12-15 20:53:29] [build-stdout] Could not guess Python version, will use default: Python 3
2023-12-15T20:53:40.9667251Z [2023-12-15 20:53:29] [build-stdout] Calling python3 /__w/_tool/CodeQL/2.15.3/x64/codeql/python/tools/get_venv_lib.py
2023-12-15T20:53:40.9669253Z [2023-12-15 20:53:29] [build-stdout] Calling python3 -S /__w/_tool/CodeQL/2.15.3/x64/codeql/python/tools/python_tracer.py -v -z all -c /__w/_temp/codeql_databases/python/working/trap_cache -p /github/home/.local/lib/python3.6/site-packages --filter include:engine/engine/*.py --filter include:cli/cli/*.py --filter include:ui/packages/**/*.ts
2023-12-15T20:53:40.9670699Z [2023-12-15 20:53:30] [build-stderr] Process ForkProcess-1:
2023-12-15T20:53:40.9671590Z [2023-12-15 20:53:30] [build-stderr] Traceback (most recent call last):
2023-12-15T20:53:40.9672693Z [2023-12-15 20:53:30] [build-stderr]   File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
2023-12-15T20:53:40.9673610Z [2023-12-15 20:53:30] [build-stderr]     self.run()
2023-12-15T20:53:40.9674889Z [2023-12-15 20:53:30] [build-stderr]   File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
2023-12-15T20:53:40.9675910Z [2023-12-15 20:53:30] [build-stderr]     self._target(*self._args, **self._kwargs)
2023-12-15T20:53:40.9677133Z [2023-12-15 20:53:30] [build-stderr]   File "/__w/_tool/CodeQL/2.15.3/x64/codeql/python/tools/python3src.zip/semmle/logging.py", line 116, in _message_loop
2023-12-15T20:53:40.9678252Z [2023-12-15 20:53:30] [build-stderr]     sys.stdout.reconfigure(encoding='utf-8')
2023-12-15T20:53:40.9679360Z [2023-12-15 20:53:30] [build-stderr] AttributeError: '_io.TextIOWrapper' object has no attribute 'reconfigure'
2023-12-15T20:53:40.9680377Z [2023-12-15 20:53:30] [build-stderr] Traceback (most recent call last):
2023-12-15T20:53:40.9681694Z [2023-12-15 20:53:30] [build-stderr]   File "/__w/_tool/CodeQL/2.15.3/x64/codeql/python/tools/python_tracer.py", line 53, in <module>
2023-12-15T20:53:40.9682937Z [2023-12-15 20:53:30] [build-stderr]     semmle.populator.main(original_path)
2023-12-15T20:53:40.9684356Z [2023-12-15 20:53:30] [build-stderr]   File "/__w/_tool/CodeQL/2.15.3/x64/codeql/python/tools/python3src.zip/semmle/populator.py", line 43, in main
2023-12-15T20:53:40.9685586Z [2023-12-15 20:53:30] [build-stderr] AttributeError: '_io.TextIOWrapper' object has no attribute 'reconfigure'
2023-12-15T20:53:40.9686589Z [2023-12-15 20:53:30] [build-stderr] Traceback (most recent call last):
2023-12-15T20:53:40.9687685Z [2023-12-15 20:53:30] [build-stderr]   File "/__w/_tool/CodeQL/2.15.3/x64/codeql/python/tools/index.py", line 23, in <module>
2023-12-15T20:53:40.9688650Z [2023-12-15 20:53:30] [build-stderr]     buildtools.index.main()
2023-12-15T20:53:40.9689831Z [2023-12-15 20:53:30] [build-stderr]   File "/__w/_tool/CodeQL/2.15.3/x64/codeql/python/tools/python3src.zip/buildtools/index.py", line 222, in main
2023-12-15T20:53:40.9691161Z [2023-12-15 20:53:30] [build-stderr]   File "/usr/lib/python3.6/subprocess.py", line 311, in check_call
2023-12-15T20:53:40.9692190Z [2023-12-15 20:53:30] [build-stderr]     raise CalledProcessError(retcode, cmd)
2023-12-15T20:53:40.9694789Z [2023-12-15 20:53:30] [build-stderr] subprocess.CalledProcessError: Command '['python3', '-S', '/__w/_tool/CodeQL/2.15.3/x64/codeql/python/tools/python_tracer.py', '-v', '-z', 'all', '-c', '/__w/_temp/codeql_databases/python/working/trap_cache', '-p', '/github/home/.local/lib/python3.6/site-packages', '--filter', 'include:engine/engine/*.py', '--filter', 'include:cli/cli/*.py', '--filter', 'include:ui/packages/**/*.ts']' returned non-zero exit status 1.
2023-12-15T20:53:40.9696854Z [2023-12-15 20:53:30] [ERROR] Spawned process exited abnormally (code 1; tried to run: [/__w/_tool/CodeQL/2.15.3/x64/codeql/python/tools/autobuild.sh])
2023-12-15T20:53:40.9698225Z [2023-12-15 20:53:30] Exception caught at top level: Exit status 1 from command: [/__w/_tool/CodeQL/2.15.3/x64/codeql/python/tools/autobuild.sh]
2023-12-15T20:53:40.9699488Z                       com.semmle.cli2.Commandline.executeAndCheckResult(Commandline.java:170)
2023-12-15T20:53:40.9700478Z                       com.semmle.cli2.Commandline.runWithoutReturn(Commandline.java:123)
2023-12-15T20:53:40.9701720Z                       com.semmle.cli2.database.DatabaseProcessCommandCommon.executeSubcommand(DatabaseProcessCommandCommon.java:226)
2023-12-15T20:53:40.9703107Z                       com.semmle.cli2.database.TraceCommandCommand.executeSubcommand(TraceCommandCommand.java:110)
2023-12-15T20:53:40.9704593Z                       com.semmle.cli2.picocli.SubcommandCommon.lambda$executeSubcommandWithMessages$5(SubcommandCommon.java:803)
2023-12-15T20:53:40.9705867Z                       com.semmle.cli2.picocli.SubcommandCommon.withCompilationMessages(SubcommandCommon.java:442)
2023-12-15T20:53:40.9707183Z                       com.semmle.cli2.picocli.SubcommandCommon.executeSubcommandWithMessages(SubcommandCommon.java:801)
2023-12-15T20:53:40.9708421Z                       com.semmle.cli2.picocli.SubcommandCommon.toplevelMain(SubcommandCommon.java:685)
2023-12-15T20:53:40.9709472Z                       com.semmle.cli2.picocli.SubcommandCommon.call(SubcommandCommon.java:666)
2023-12-15T20:53:40.9710507Z                       com.semmle.cli2.picocli.SubcommandMaker.runMain(SubcommandMaker.java:237)
2023-12-15T20:53:40.9711523Z                       com.semmle.cli2.picocli.SubcommandMaker.runMain(SubcommandMaker.java:247)
2023-12-15T20:53:40.9712412Z                       com.semmle.cli2.CodeQL.main(CodeQL.java:115)
2023-12-15T20:53:40.9713399Z ::endgroup::
2023-12-15T20:53:40.9714036Z ##[endgroup]

codeql.yml

# For most projects, this workflow file will not need changing; you simply need
# to commit it to your repository.
#
# You may wish to alter this file to override the set of languages analyzed,
# or to provide custom queries or build logic.
#
# ******** NOTE ********
# We have attempted to detect the languages in your repository. Please check
# the `language` matrix defined below to confirm you have the correct set of
# supported CodeQL languages.
#
name: "CodeQL"

on:
  pull_request:
    branches: [ "main" ]
  push:
    branches: [ "main" ]

jobs:
  analyze:
    name: Analyze
    runs-on: [ self-hosted, Linux ]
   container: 
     image: <custom image>
    permissions:
      actions: read
      contents: read
      security-events: write

    strategy:
      fail-fast: false
      matrix:
        language: [ 'javascript', 'python' ]
        # CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python', 'ruby' ]
        # Use only 'java' to analyze code written in Java, Kotlin or both
        # Use only 'javascript' to analyze code written in JavaScript, TypeScript or both
        # Learn more about CodeQL language support at https://aka.ms/codeql-docs/language-support

    steps:
    - name: Checkout repository
      uses: actions/checkout@v3

    # Initializes the CodeQL tools for scanning.
    - name: Initialize CodeQL
      uses: github/codeql-action/init@v2
      with:
        languages: ${{ matrix.language }}
        config-file: ./.github/codeql/codeql-config.yml
        # If you wish to specify custom queries, you can do so here or in a config file.
        # By default, queries listed here will override any specified in a config file.
        # Prefix the list here with "+" to use these queries and those in the config file.

        # Details on CodeQL's query packs refer to : https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs
        # queries: security-extended,security-and-quality

    # Autobuild attempts to build any compiled languages  (C/C++, C#, Go, or Java).
    # If this step fails, then you should remove it and run the build manually (see below)
    - name: Autobuild
      uses: github/codeql-action/autobuild@v2

    # ℹ️ Command-line programs to run using the OS shell.
    # 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun

    #   If the Autobuild fails above, remove it and uncomment the following three lines.
    #   modify them (or add more) to build your code if your project, please refer to the EXAMPLE below for guidance.

    # - run: |
    #   echo "Run, Build Application using script"
    #   ./location_of_script_within_repo/buildscript.sh

    - name: Perform CodeQL Analysis
      uses: github/codeql-action/analyze@v2
      with:
        category: "/language:${{matrix.language}}"

codeql-config.yml

paths:
  - engine/engine/*.py
  - cli/cli/*.py
  - ui/packages/**/*.ts

aibaars commented 10 months ago

A web search for AttributeError: '_io.TextIOWrapper' object has no attribute 'reconfigure' suggests this problem can be solved by using Python version 3.7 or higher. It looks like your self-hosted runner has python 3.6 which is pretty old. Could you try upgrading the python version?

akr-amd commented 10 months ago

Sorry, I forgot to mention that this job runs within a container using a custom image that had Python 3.11. I forgot to include the container part in my workflow file snippet (updated it now)

akr-amd commented 10 months ago

Maybe running the setup-python-dependencies as part of the init is causing the issue? I see it's run by default

https://github.com/github/codeql-action/blob/511f073971a2ce589ceea100a90831c5ca4310bb/init/action.yml#L66-L69

aeisenberg commented 10 months ago

I'm not sure, but even in a container, the workflow will try to use the python version in the toolcache. Could you try explicitly setting up python 3.11? Something like this, before you run the init step.

- uses: actions/setup-python@v5
  with:
    python-version: '3.11'

akr-amd commented 10 months ago

Now that error is gone but I am presented with a different one

 /__w/_tool/CodeQL/2.15.3/x64/codeql/codeql database finalize --finalize-dataset --threads=4 --ram=29890 /__w/_temp/codeql_databases/python
  CodeQL detected code written in Python but could not process any of it. Review our troubleshooting guide at https://gh.io/troubleshooting-code-scanning/no-source-code-seen-during-build .
  Error: Encountered a fatal error while running "/__w/_tool/CodeQL/2.15.3/x64/codeql/codeql database finalize --finalize-dataset --threads=4 --ram=29890 /__w/_temp/codeql_databases/python". Exit code was 32 and last log line was: CodeQL detected code written in Python but could not process any of it. Review our troubleshooting guide at https://gh.io/troubleshooting-code-scanning/no-source-code-seen-during-build . See the logs for more details.

aeisenberg commented 10 months ago

Hmmm...have you tried turning off setup-python-dependencies?

akr-amd commented 10 months ago

Just tried that by setting setup-python-dependencies: false. Same failure unfortunately 😞

akr-amd commented 10 months ago

I'm not sure, but even in a container, the workflow will try to use the python version in the toolcache.

@aeisenberg Also, may I ask why is this the case? The TypeScript CodeQL scanner seems happy to use the Node version installed in the container, but the Python scanner doesn't seem to want to use it.

aeisenberg commented 10 months ago

For typescript, no compilation or code execution is required during extraction. For python, we need to execute the python extractor, which is built in python.

@RasmusWL do you have any suggestions on what to do?

aibaars commented 10 months ago

@akr-amd The error message you are seeing now is caused by CodeQL not scanning any Python source files in the repository folder.

Could you try running without the ./.github/codeql/codeql-config.yml configuration file? Perhaps the paths: are resolved on the host system, so they could be misaligned when running things inside a docker container. Could you print the value of the LGTM_INDEX_FILTERS environment variable in the workflow and also run a find . -name '*.py' command.

Another reason could be a mismatch between what CodeQL considers the "source root" and the path of the repository in the container. In that case CodeQL did scan files but would not count them before they are found in an "external" folder.

Could you also re-run the workflow with debug logging enabled? That should result in a CodeQL debug artifact containing much more detailed logs and also any source files that CodeQL has picked up.

Finally, there is no need to run Python or Typescript analysis in a docker container, so you could also try removing the container: property from the workflow. And if you really want all workflows in the self-hosted runner to run in docker containers you could try using https://github.com/actions/actions-runner-controller or a similar approach.

RasmusWL commented 10 months ago

The suggestions from @aibaars seems solid, let's see if those suggestions solves the problem :+1:

akr-amd commented 10 months ago

At the outset, thanks for being super responsive and helpful! 🙂

Could you try running without the ./.github/codeql/codeql-config.yml configuration file? Perhaps the paths: are resolved on the host system, so they could be misaligned when running things inside a docker container

Based on feedback, here's how I changed the workflow

No config file ./.github/codeql/codeql-config.yml
Not run in a container

The new workflow file is below

# For most projects, this workflow file will not need changing; you simply need
# to commit it to your repository.
#
# You may wish to alter this file to override the set of languages analyzed,
# or to provide custom queries or build logic.
#
# ******** NOTE ********
# We have attempted to detect the languages in your repository. Please check
# the `language` matrix defined below to confirm you have the correct set of
# supported CodeQL languages.
#
name: "CodeQL"

on:
  pull_request:
    branches: [ "main" ]
  push:
    branches: [ "main" ]

jobs:
  analyze:
    name: Analyze
    runs-on: [ self-hosted, Linux ]
    permissions:
      actions: read
      contents: read
      security-events: write

    strategy:
      fail-fast: false
      matrix:
        language: [ 'javascript', 'python' ]
        # CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python', 'ruby' ]
        # Use only 'java' to analyze code written in Java, Kotlin or both
        # Use only 'javascript' to analyze code written in JavaScript, TypeScript or both
        # Learn more about CodeQL language support at https://aka.ms/codeql-docs/language-support

    steps:
    - name: Checkout repository
      uses: actions/checkout@v3

    - uses: actions/setup-python@v4
      with:
        python-version: '3.11'

    - uses: actions/setup-node@v3
      with:
        node-version: '16.15.1'

    # Initializes the CodeQL tools for scanning.
    - name: Initialize CodeQL
      uses: github/codeql-action/init@v2
      with:
        languages: ${{ matrix.language }}
        # config-file: ./.github/codeql/codeql-config.yml
        setup-python-dependencies: false
        # If you wish to specify custom queries, you can do so here or in a config file.
        # By default, queries listed here will override any specified in a config file.
        # Prefix the list here with "+" to use these queries and those in the config file.

        # Details on CodeQL's query packs refer to : https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs
        # queries: security-extended,security-and-quality

    # Autobuild attempts to build any compiled languages  (C/C++, C#, Go, or Java).
    # If this step fails, then you should remove it and run the build manually (see below)
    - name: Autobuild
      uses: github/codeql-action/autobuild@v2

    # ℹ️ Command-line programs to run using the OS shell.
    # 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun

    #   If the Autobuild fails above, remove it and uncomment the following three lines.
    #   modify them (or add more) to build your code if your project, please refer to the EXAMPLE below for guidance.

    # - run: |
    #   echo "Run, Build Application using script"
    #   ./location_of_script_within_repo/buildscript.sh

    - run: |
        echo "$(which python)"
        echo "$(python -V)"
        echo $LGTM_INDEX_FILTERS
        find . -name '*.py'

    - name: Perform CodeQL Analysis
      uses: github/codeql-action/analyze@v2
      with:
        category: "/language:${{matrix.language}}"

Without providing the config file, it is able to find all the files. But that's the next problem.

The "Analyze (javascript)" job is also looking at CodeQL's js files

  [2023-12-18 10:44:27] [build-stdout] Extracting /scratch/actions-runner-1/_work/_tool/CodeQL/2.15.4/x64/codeql/javascript/tools/data/externs/lib/jquery-3.2.js
  [2023-12-18 10:44:27] [build-stdout] Extracting /scratch/actions-runner-1/_work/_tool/CodeQL/2.15.4/x64/codeql/javascript/tools/data/externs/lib/bdd.js
  [2023-12-18 10:44:27] [build-stdout] Extracting /scratch/actions-runner-1/_work/_tool/CodeQL/2.15.4/x64/codeql/javascript/tools/data/externs/lib/vows.js
  [2023-12-18 10:44:27] [build-stdout] Extracting /scratch/actions-runner-1/_work/_tool/CodeQL/2.15.4/x64/codeql/javascript/tools/data/externs/lib/should.js
  [2023-12-18 10:44:27] [build-stdout] Done extracting /scratch/actions-runner-1/_work/_tool/CodeQL/2.15.4/x64/codeql/javascript/tools/data/externs/lib/bdd.js (11 ms)
  [2023-12-18 10:44:27] [build-stdout] Extracting /scratch/actions-runner-1/_work/_tool/CodeQL/2.15.4/x64/codeql/javascript/tools/data/externs/web/w3c_dom4.js
  [2023-12-18 10:44:27] [build-stdout] Done extracting /scratch/actions-runner-1/_work/_tool/CodeQL/2.15.4/x64/codeql/javascript/tools/data/externs/web/w3c_dom4.js (7 ms)
  [2023-12-18 10:44:27] [build-stdout] Extracting /scratch/actions-runner-1/_work/_tool/CodeQL/2.15.4/x64/codeql/javascript/tools/data/externs/web/ie_event.js
  [2023-12-18 10:44:27] [build-stdout] Done extracting /scratch/actions-runner-1/_work/_tool/CodeQL/2.15.4/x64/codeql/javascript/tools/data/externs/lib/should.js (22 ms)

and the "Analyze (python)" job also scanning the .py files from the base interpreter

  [2023-12-18 10:24:25] [build-stdout] [INFO] [4] Extracted folder /scratch/ghe-runners/1/_work/_tool/Python/3.11.2/x64/lib/python3.11/urllib in 0ms
  [2023-12-18 10:24:25] [build-stdout] [INFO] [1] Extracted file /scratch/ghe-runners/1/_work/_tool/Python/3.11.2/x64/lib/python3.11/datetime.py in 1546ms
  [2023-12-18 10:24:25] [build-stdout] [INFO] [1] Extracted file /scratch/ghe-runners/1/_work/_tool/Python/3.11.2/x64/lib/python3.11/copy.py in 191ms
  [2023-12-18 10:24:25] [build-stdout] [INFO] [1] Extracted file /scratch/ghe-runners/1/_work/_tool/Python/3.11.2/x64/lib/python3.11/json/__init__.py in 92ms
  [2023-12-18 10:24:25] [build-stdout] [INFO] [1] Extracted file /scratch/ghe-runners/1/_work/_tool/Python/3.11.2/x64/lib/python3.11/unittest/signals.py in 40ms
  [2023-12-18 10:24:25] [build-stdout] [INFO] [2] Extracted file /scratch/ghe-runners/1/_work/_tool/Python/3.11.2/x64/lib/python3.11/zipfile.py in 1537ms

Could you print the value of the LGTM_INDEX_FILTERS environment variable in the workflow

Not sure if I am doing something wrong, but the value of LGTM_INDEX_FILTERS is ""

also run a find . -name '*.py' command

Here's the output of find . -name '*.py' -- it does list all the python files from the repo

./cli/cli/__init__.py
./cli/cli/_internal/__init__.py
./cli/cli/_internal/_open_target_wiz.py
./cli/cli/_internal/_server_management.py
./cli/cli/_internal/event_handlers.py
./cli/cli/_internal/task_tracker.py
./cli/cli/_internal/tasks.py
./cli/cli/cable.py
./cli/cli/commands.py
./cli/cli/common.py
.
.
.
.
.
.
.
./engine/.vscode/pydevd/pydevd_plugins/__init__.py
./engine/.vscode/pydevd/pydevd_plugins/extensions/__init__.py
./engine/.vscode/pydevd/pydevd_plugins/extensions/pydevd_plugin_chipscopy.py
./engine/engine/__init__.py
./engine/engine/exceptions.py
./engine/engine/interceptor.py
./engine/engine/main.py
./engine/engine/protobuf/__init__.py
./engine/engine/protobuf/generate.py
./engine/engine/runtime/__init__.py
.
.
.

Finally, there is no need to run Python or Typescript analysis in a docker container, so you could also try removing the container: property from the workflow. And if you really want all workflows in the self-hosted runner to run in docker containers you could try using https://github.com/actions/actions-runner-controller or a similar approach.

May I ask why the analysis shouldn't be run in a container? I was coming at it from the angle of 'pristine env of a container won't cause any env unintended cross-contamination issues' The ARC seems like a neat thing, but I can't implement that myself -- our DevOps teams would have to implement it, which I don't think will happen right away.

aibaars commented 10 months ago

Without providing the config file, it is able to find all the files. But that's the next problem. The "Analyze (javascript)" job is also looking at CodeQL's js files and the "Analyze (python)" job also scanning the .py files from the base interpreter

That is actually not a problem, but expected behaviour. Those JavaScript files from CodeQL contain stubs for JS functions that are available by default in (browser) environments. The Python analysis also scans the standard libraries to figure out dataflow through standard functions. CodeQL can stop doing that, once the team has implemented QL models for the standard libraries, but for now it is still needed.

Not sure if I am doing something wrong, but the value of LGTM_INDEX_FILTERS is ""

Ah yes sorry, I should have made clear that I was interested in the value lf LGTM_INDEX_FILTERS before removing the configuration file. Internally, the CodeQL interprets the paths:/paths-ignore: settings and puts them into that environment variable. I was hoping to see if there is a mismatch between the paths mentioned in the LGTM_INDEX_FILTERS and the paths reported by find.

May I ask why the analysis shouldn't be run in a container? I was coming at it from the angle of 'pristine env of a container won't cause any env unintended cross-contamination issues'

I did not mean to say that you shouldn't run in a container. Having a "pristine" env can be quite beneficial. It's just that JS and Python analysis don't change the environment and should work fine in a not so "pristine" environment. The reason why I asked to run outside a container was to reduce complexity. When running in a container there is always the risk of mixing up file paths from the host or the container. Things should have worked fine with a configuration file and a container, but they didn't ;-) To debug it helps to disable those features to see which one causes the problem, or whether it is the combination of both.

akr-amd commented 10 months ago

That is actually not a problem, but expected behaviour. Those JavaScript files from CodeQL contain stubs for JS functions that are available by default in (browser) environments. The Python analysis also scans the standard libraries to figure out dataflow through standard functions. CodeQL can stop doing that, once the team has implemented QL models for the standard libraries, but for now it is still needed.

Oh ok got it. So, seems it's safe to ignore these.

Ah yes sorry, I should have made clear that I was interested in the value lf LGTM_INDEX_FILTERS before removing the configuration file. Internally, the CodeQL interprets the paths:/paths-ignore: settings and puts them into that environment variable. I was hoping to see if there is a mismatch between the paths mentioned in the LGTM_INDEX_FILTERS and the paths reported by find.

Hmm, I used the config file (shown below) and re-ran the workflow, but LGTM_INDEX_FILTERS is still empty. Any other things you would like me to try?

paths:
  - 'engine/**/*.py'
  - 'cli/**/*.py'
  - 'ui/**/*.ts'

I did not mean to say that you shouldn't run in a container. Having a "pristine" env can be quite beneficial. It's just that JS and Python analysis don't change the environment and should work fine in a not so "pristine" environment. The reason why I asked to run outside a container was to reduce complexity. When running in a container there is always the risk of mixing up file paths from the host or the container. Things should have worked fine with a configuration file and a container, but they didn't ;-) To debug it helps to disable those features to see which one causes the problem, or whether it is the combination of both.

Makes sense - Eliminate possible sources of problem

RasmusWL commented 10 months ago

I could reproduce, and narrowed down the problem to cases where paths contains globs 😬 As a workaround, can you please use this config file?

paths:
  - 'engine/'
  - 'cli/'
  - 'ui/'

(if analyzing .py files inside ui is a problem, you can use paths-ignore property to exclude those with something like ui/**/*.py)

akr-amd commented 9 months ago

I could reproduce, and narrowed down the problem to cases where paths contains globs 😬 As a workaround, can you please use this config file?
paths:
  - 'engine/'
  - 'cli/'
  - 'ui/'

Sorry, I thought I replied back. The suggested config file does work @RasmusWL. Thanks!

BullHacks3 commented 9 months ago

👋 Team,

I'm also facing similar issue while running codeql 2.15.3 version . Note I'm only testing a simple python codeql query . For other languages java/javascript same command is working fine.

Command /home/bakul/codeql/codeql test run UrlRedirect.ql --show-extractor-output

Output

Executing 1 tests in 1 directories. Extracting test database in /home/bakul/codeql/python/ql/src/experimental/security/url-redirect. [2024-01-16 00:29:46] [build-err] Process ForkProcess-1: [2024-01-16 00:29:46] [build-err] Traceback (most recent call last): [2024-01-16 00:29:46] [build-err] File "/usr/lib64/python3.6/multiprocessing/process.py", line 258, in _bootstrap [2024-01-16 00:29:46] [build-err] self.run() [2024-01-16 00:29:46] [build-err] File "/usr/lib64/python3.6/multiprocessing/process.py", line 93, in run [2024-01-16 00:29:46] [build-err] self._target(self._args, self._kwargs) [2024-01-16 00:29:46] [build-err] File "/home/bakul/codeql/python/tools/python3src.zip/semmle/logging.py", line 116, in _message_loop [2024-01-16 00:29:46] [build-err] sys.stdout.reconfigure(encoding='utf-8') [2024-01-16 00:29:46] [build-err] AttributeError: '_io.TextIOWrapper' object has no attribute 'reconfigure' [2024-01-16 00:29:46] [build-err] Traceback (most recent call last): [2024-01-16 00:29:46] [build-err] File "/home/bakul/codeql/python/tools/python_tracer.py", line 53, in [2024-01-16 00:29:46] [build-err] semmle.populator.main(original_path) [2024-01-16 00:29:46] [build-err] File "/home/bakul/codeql/python/tools/python3src.zip/semmle/populator.py", line 43, in main [2024-01-16 00:29:46] [build-err] AttributeError: '_io.TextIOWrapper' object has no attribute 'reconfigure' [2024-01-16 00:29:46] [ERROR] Spawned process exited abnormally (code 1; tried to run: [python3, /home/bakul/codeql/python/tools/python_tracer.py, --lang=3, --filter=exclude:/.testproj/**, --path, /home/bakul/codeql/python/ql/src/experimental/security/url-redirect, --verbosity, 3, --colorize]) Could not extract a dataset in /home/bakul/codeql/python/ql/src/experimental/security/url-redirect: Extraction command python3 failed with status 1. Extraction command python3 failed with status 1. [1/1] FAILED(EXTRACTION) /home/bakul/codeql/python/ql/src/experimental/security/url-redirect/UrlRedirect.ql Compiling queries in /home/bakul/codeql/python/ql/src/experimental/security/url-redirect. Completed in 4.5s (extract 1.2s comp 0ms eval 0ms). 0 tests passed; 1 tests failed: FAILED: /home/bakul/codeql/python/ql/src/experimental/security/url-redirect/UrlRedirect.ql`

RasmusWL commented 9 months ago

Hi @BullHacks3, as already mentioned in this issue:

A web search for AttributeError: '_io.TextIOWrapper' object has no attribute 'reconfigure' suggests this problem can be solved by using Python version 3.7 or higher. It looks like your self-hosted runner has python 3.6 which is pretty old. Could you try upgrading the python version?

BullHacks3 commented 9 months ago

Hey @RasmusWL I'm already using python 3.11 version .Thanks

RasmusWL commented 9 months ago

@BullHacks3 please open a new issue then.

BullHacks3 commented 9 months ago

Thanks @RasmusWL , created new issue for same : https://github.com/github/codeql/issues/15337. Thanks

github / codeql-action

v2.15.3 - Python code analysis fails #2042

codeql.yml

codeql-config.yml