github / codeql-action

Actions for running CodeQL analysis
MIT License
1.09k stars 305 forks source link

CodeQL Analysis times out while creating zip file of results #1541

Open futureviperowner opened 1 year ago

futureviperowner commented 1 year ago

I have CodeQL analysis enabled on a project that performs scans of C code using the cpp configuration. Normally, the scans works just fine. However, we had a scheduled scan fail today due to a timeout while preparing the zip file of the analysis results. The scan was performed against a commit that was successfully scanned previously, so it shouldn't be caused by anything specific to changes in the source being scanned.

A log file of the scan is attached.

codeql-cpp-analysis-timeout.zip

Unfortunately, there isn't much to go on since debug logs aren't enabled. See line 1656 for the line that hangs before it hits the 1 hour timeout configured for the job. The previously successful scan of this commit completed in just under 10 minutes.

One thing of note is that this job executes within the context of a custom docker image configured on the workflow. However, I suspect this is fairly common when working with C code due to the platform dependencies when building and executing it.

alexet commented 1 year ago

Hi

Thanks for the report. Does it reproduce if you re-run it with debug logs enabled?

futureviperowner commented 1 year ago

I have re-run the job numerous times today while debug logs are enabled without success in reproducing. I've enabled debug logs in the repo for now and will monitor for similar failures in the future.

futureviperowner commented 1 year ago

I ran into this again today, however it was while a java analysis was being done. Debug logging was enabled for this execution. I did not see anything useful in relation to why execution hung until the timeout was reached. It logged that the zip file was being created and then nothing else was logged until the eventual timeout 9 minutes later.

2023-03-27T18:20:56.2764900Z [command]/__t/CodeQL/2.12.5-20230317/x64/codeql/codeql database bundle /__w/_temp/codeql_databases/java --output=/__w/_temp/codeql_databases/java.zip --name=java
2023-03-27T18:20:57.2633395Z Creating bundle metadata for /__w/_temp/codeql_databases/java...
2023-03-27T18:20:57.4642192Z Creating zip file at /__w/_temp/codeql_databases/java.zip.
2023-03-27T18:29:55.4573827Z ##[debug]CODEQL_ACTION_VERSION='2.2.9'
2023-03-27T18:29:55.4574254Z ##[debug]CODEQL_ACTION_FEATURE_SARIF_COMBINE='true'
2023-03-27T18:29:55.4574646Z ##[debug]CODEQL_ACTION_FEATURE_WILL_UPLOAD='true'
2023-03-27T18:29:55.4575047Z ##[debug]CODEQL_ACTION_FEATURE_MULTI_LANGUAGE='false'
2023-03-27T18:29:55.4575426Z ##[debug]CODEQL_ACTION_FEATURE_SANDWICH='false'
2023-03-27T18:29:55.4575891Z ##[debug]CODEQL_UPLOAD_SARIF__LANGUAGE_JAVA__CODEQL='CODEQL_UPLOAD_SARIF__LANGUAGE_JAVA__CODEQL'
2023-03-27T18:29:55.4577417Z ##[debug]Set output db-locations = {"java":"/__w/_temp/codeql_databases/java"}
2023-03-27T18:29:55.4577883Z ##[debug]Set output sarif-id = 1da297f2-cccc-11ed-9d41-2afcb5d1b5c6
2023-03-27T18:29:55.4588334Z ##[error]The action has timed out.

1_Analyze (java)-hung.zip

aeisenberg commented 1 year ago

There's still not much to go on here. It looks like the results were uploaded successfully in this latest run, but zipping of the docker container failed due to the timeout. Is the problem repeatable? Do you know if this is a large database? If this is not happening on all runs, how long does it take to zip the database normally?

It's likely that if you're running a docker container inside of a standard GitHub runner, there's not much resources left to actually do the work. Maybe it always takes a long time to zip, but most of the time you're just under the threshold for the timeout.

futureviperowner commented 1 year ago

The issue seems to be random when it occurs. The re-run of the job yesterday succeeded with the CodeQL analysis step taking 1m 2s. The overall job execution was 3m 9s. Is there a way to find the size of the database? All I can find in the logs is the size of the results upload (which is under 2 MB unzipped). Logs indicate that the database contains 24,707 lines of Java code. Timestamps on a successful ZIP file creation and upload took 3 seconds:

Mon, 27 Mar 2023 21:36:08 GMT ::group::Uploading results
Mon, 27 Mar 2023 21:36:08 GMT Uploading results
Mon, 27 Mar 2023 21:36:08 GMT Processing sarif files: ["/__w/exec-wms-platform-beacon-integrator/results/java.sarif"]
Mon, 27 Mar 2023 21:36:09 GMT ##[debug]Raw upload size: 1721090 bytes
Mon, 27 Mar 2023 21:36:09 GMT ##[debug]Base64 zipped upload size: 196496 bytes
Mon, 27 Mar 2023 21:36:09 GMT ##[debug]Number of results in upload: 3
Mon, 27 Mar 2023 21:36:09 GMT Uploading results
Mon, 27 Mar 2023 21:36:09 GMT ##[debug]response status: 202
Mon, 27 Mar 2023 21:36:09 GMT Successfully uploaded results
Mon, 27 Mar 2023 21:36:09 GMT ::endgroup::
Mon, 27 Mar 2023 21:36:09 GMT /__t/CodeQL/2.12.5-20230317/x64/codeql/codeql database bundle /__w/_temp/codeql_databases/java --output=/__w/_temp/codeql_databases/java.zip --name=java
Mon, 27 Mar 2023 21:36:10 GMT Creating bundle metadata for /__w/_temp/codeql_databases/java...
Mon, 27 Mar 2023 21:36:10 GMT Creating zip file at /__w/_temp/codeql_databases/java.zip.
Mon, 27 Mar 2023 21:36:13 GMT ##[debug]Successfully uploaded database for java
Mon, 27 Mar 2023 21:36:13 GMT ::group::Waiting for processing to finish
Mon, 27 Mar 2023 21:36:13 GMT Waiting for processing to finish
Mon, 27 Mar 2023 21:36:13 GMT Analysis upload status is complete.
Mon, 27 Mar 2023 21:36:13 GMT ::endgroup::

It is running on a standard GitHub runner.

aeisenberg commented 1 year ago

If you run in debug mode, then the database will be uploaded as an artifact at the end of the analysis job (from the log messages, it looks like you're already doing that?). You can then download it later. However, 24,707 lines of Java is not large, so I'd be surprised if the database is large. The fact that you are running the analysis in a docker container does add a layer of complexity. Is this a requirement for your java code? It might help to build without docker if you can.

futureviperowner commented 1 year ago

Debug logging has been enabled in this repo since my initial report in hopes of catching more info around the timeout. This particular project contains both Java and C, which is why the build is being done inside of a docker container. The image is based on the deployment image with additional tooling added to support building the native code.

While I've only run into this timeout with this particular project, other members of my team have reported codeql timeouts in other projects that don't build inside of a docker container. However, I just found out about this today so I haven't been able to compare those failures to this one to see if it looks like the same issue.

aeisenberg commented 1 year ago

You could try explicitly setting the timeout to a higher number. Based on the logs, however, it looks like zip time is either negligible or takes 5+ minutes, so perhaps something is getting stuck.

futureviperowner commented 1 year ago

I'm not inclined to increase the job timeout as it doesn't appear as though there's a reasonable expectation that will resolve the issue (other than chew up more GH runner minutes).

Is there some additional logging that can be enabled to debug further? It looks like the action is executing a codeql binary at the time it gets hung up so I wasn't able to follow the code well enough to find this out on my own.

aeisenberg commented 1 year ago

Yes, the action is calling codeql database bundle which is a light wrapper around a command to zip the database directory. There's no additional logging available here.

I've asked the rest of the team for ideas. It is odd that a call to write to a zip file is causing a process to hang.

aeisenberg commented 1 year ago

Some suggestions:

  1. Add upload-database: false to the analyze step. This will prevent the database from being zipped and uploaded.
  2. Can you share the workflow file? That will let us know if there is anything that we can improve there.
  3. You mentioned that you need to run the C++ build job in the container. Do you also need to do the same for Java? You will likely see better performance if you run outside the container.
  4. Are you creating a new container for each job? Is there any possibility that some state is being shared between jobs or workflow runs?
futureviperowner commented 1 year ago

Thanks for sticking with me on this. 😃

  1. Will skipping the database upload affect results or build checks in any way? I can't tell from the docs if this just affects the availability of a build artifact or if there's a functional impact.

  2. Yes. I've attached a anonymized version.
    codeql-scan-workflow.zip

  3. No, I don't technically need to do this. However, I'm attempting to ensure that all builds, tests, and static analysis tools are using the same environment and build artifacts so that we don't introduce (or miss) issues that might be dependent on these things.

  4. The GH runner downloads the image and prepares a fresh container each time it's executed. There is no state being persisted with each execution.

aeisenberg commented 1 year ago

Thanks for sticking with me on this. 😃

No worries!

1. Will skipping the database upload affect results or build checks in any way?  I can't tell from the docs if this just affects the availability of a build artifact or if there's a functional impact.

There is no functional impact. Database upload only happens to help you with some extra analysis later.

2. Yes.  I've attached a anonymized version.
   [codeql-scan-workflow.zip](https://github.com/github/codeql-action/files/11094210/codeql-scan-workflow.zip)

Thanks. I see it.

I have to admit that I'm stumped about this. I am not sure why writing to a zip file may occasionally hang. There are some things I can think of:

Howver, I think your best bet for now is to avoid uploading the database.

futureviperowner commented 1 year ago

I'll try disabling the database upload and see how it goes. There are no large binaries involved. I'm not sure what you meant by soft/hard links in this context. Are we talking filesystem links? If so, none of that is applicable to this project.