github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.69k stars 1.54k forks source link

Chromium: `We have exhausted all available IDs in the disk pool` #17332

Open Manouchehri opened 2 months ago

Manouchehri commented 2 months ago

When building a large application, I got the following error:

7b012e813b8c62de4e17eb1e.trap (maglev-graph-builder.cc.adcdb985_0.trap.tar.zst), 22: com.semmle.util.exception.CatastrophicError: We have exhausted all available IDs in the disk pool /databases/v1/demo/db-cpp/default/idPool

Unsure of what it means, any ideas?

jketema commented 2 months ago

Hi @Manouchehri ,

This means you've hit some internal limit of our database finalization process, due to the size of your application. Which version of CodeQL are you using? We have recently made some improvements in this area, so it would be good to know if you're benefitting from these, or not.

If your application is in fact a large monorepo, the general guidance would be to create a database per application, and not one that includes everything. I cannot tell from your question whether this is the case or not.

Manouchehri commented 2 months ago

Which version of CodeQL are you using?

2.18.3+202408191541

If your application is in fact a large monorepo, the general guidance would be to create a database per application

It's a single application, Chromium.

jketema commented 2 months ago

2.18.3+202408191541

Thanks. Note that we do not recommend using nightly builds, but that will of course have all the relevant fixes for you.

It's a single application, Chromium.

Can I assume this was just a standard build of a recently checked out copy of Chromium?

Manouchehri commented 2 months ago

Can I assume this was just a standard build of a recently checked out copy of Chromium?

Correct. No patches or anything made to it.

jketema commented 2 months ago

Thanks. I'll need to discuss this further internally.

My hypothesis is that although we tried to limit the use of the ID pool, the fixes that we've applied for the issues filed by @flowerhack, which I know you've seen, have caused substantially more code to be extracted, pushing us over the ID pool limit.

My expectation is that this is likely going to take some time to fix.

Manouchehri commented 2 months ago

Is there another limiting factor, like reaching the max size of a unit32?

The reason I'm asking, is if it's a just a constant, I might be able to just monkey patch the CodeQL binary myself in the meantime.

jketema commented 2 months ago

The limiting factor here is indeed some 32-bit integer. However, 32-bit integers are pretty fundamental to both the finalization code and the query evaluator, so patching the binary is likely going to be extremely difficult.

jketema commented 2 months ago

Would you be able to share the maglev-graph-builder.cc.adcdb985_0.trap.tar.zst file mentioned in the error message. The file should contain a number of text files (that have a .trap extension). The only sensitive information that is potentially in there are the path names of files that were parsed by the tool.

Manouchehri commented 2 months ago

What folder should that be in? I oddly don't seem to have any *.tar.zst files.

jketema commented 2 months ago

It should be in a subdirectory of /databases/v1/demo/trap/cpp/tarballs, where /databases/v1/demo/ seems to be your database directory.

jketema commented 2 months ago

Hi @Manouchehri,

The latest nightly v2.18.4+202409122320 has a fix that addresses what seemed to be one of the major causes of the ID pool exhaustion, so you might want to see if this fixes the issue you.

rvermeulen commented 4 weeks ago

Hi @Manouchehri,

Were you able to create a database with a newer CLI version?