github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.74k stars 1.56k forks source link

Codeql database create fails when building mozilla #16001

Open mies47 opened 8 months ago

mies47 commented 8 months ago

I'm trying to build mozilla from this repo and create a cpp codeql database.

In order to that, I first run ./mach configure and after it's done, I run the following command to create the database:

~/codeql/codeql database create mozilla4 --language=c-cpp --command="./mach build"

Which builds mozilla successfully, but it fails in the importing trap file stage. I have included the logs for creating the database and importing dataset.

A glance of the errors is like:

[ERROR] 14135855_0.trap.br, 1: java.io.IOException: Brotli stream decoding failed
                              org.brotli.dec.BrotliInputStream.read(BrotliInputStream.java:167)
                              com.semmle.inmemory.trap.TrapInputStream.read(TrapInputStream.java:60)
                              com.semmle.inmemory.trap.TrapScanner.fill(TrapScanner.java:451)
                              com.semmle.inmemory.trap.TrapScanner.ensureNext(TrapScanner.java:428)
                              com.semmle.inmemory.trap.TrapScanner.nextToken(TrapScanner.java:61)
                              com.semmle.inmemory.trap.TRAPReader.scanTuplesAndLabels(TRAPReader.java:493)
                              com.semmle.inmemory.trap.TRAPLinker$TrapLinkDirectiveScanner.scanTuplesAndLabels(TRAPLinker.java:311)
                              com.semmle.inmemory.trap.TRAPReader.importTuples(TRAPReader.java:414)
                              com.semmle.inmemory.trap.TRAPReader.importTuples(TRAPReader.java:400)
                              com.semmle.inmemory.trap.TRAPLinker.lambda$getTasks$2(TRAPLinker.java:215)
                              com.semmle.util.concurrent.FutureUtils.lambda$mapAsync_$8(FutureUtils.java:161)
                              java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source)
                              java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
                              java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                              java.base/java.lang.Thread.run(Unknown Source)

                               ... caused by:

                              org.brotli.dec.BrotliRuntimeException: Corrupted Huffman code histogram
                              org.brotli.dec.Decode.readComplexHuffmanCode(Decode.java:591)
                              org.brotli.dec.Decode.readHuffmanCode(Decode.java:613)
                              org.brotli.dec.Decode.readMetablockPartition(Decode.java:783)
                              org.brotli.dec.Decode.readMetablockHuffmanCodesAndContextMaps(Decode.java:825)
                              org.brotli.dec.Decode.decompress(Decode.java:1110)
                              org.brotli.dec.BrotliInputStream.read(BrotliInputStream.java:162)
                              com.semmle.inmemory.trap.TrapInputStream.read(TrapInputStream.java:60)
                              com.semmle.inmemory.trap.TrapScanner.fill(TrapScanner.java:451)
                              com.semmle.inmemory.trap.TrapScanner.ensureNext(TrapScanner.java:428)
                              com.semmle.inmemory.trap.TrapScanner.nextToken(TrapScanner.java:61)
                              com.semmle.inmemory.trap.TRAPReader.scanTuplesAndLabels(TRAPReader.java:493)
                              com.semmle.inmemory.trap.TRAPLinker$TrapLinkDirectiveScanner.scanTuplesAndLabels(TRAPLinker.java:311)
                              com.semmle.inmemory.trap.TRAPReader.importTuples(TRAPReader.java:414)
                              com.semmle.inmemory.trap.TRAPReader.importTuples(TRAPReader.java:400)
                              com.semmle.inmemory.trap.TRAPLinker.lambda$getTasks$2(TRAPLinker.java:215)
                              com.semmle.util.concurrent.FutureUtils.lambda$mapAsync_$8(FutureUtils.java:161)
                              java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source)
                              java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
                              java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
                              java.base/java.lang.Thread.run(Unknown Source)
                              at (start of line)

I can build mozilla separately following similar steps, but when I try to create a database it fails. The database is created in the end, but some of the files are missing and are not added to the database.

Can someone please point me to the right direction on how to solve this issue?

smowton commented 8 months ago

Could you zip up and upload the database directory? It will contain the logs of what is probably a crash midway through writing a compressed trap file.

mies47 commented 8 months ago

Could you zip up and upload the database directory? It will contain the logs of what is probably a crash midway through writing a compressed trap file.

Of course. Here's the zip of database directory. Thanks for your help

jketema commented 8 months ago

It seems that there are indeed around 360 files on which we crash, as @smowton already suspected. These seem internal errors in the C/C++ frontend that we use. You might want to re-try with CodeQL 2.16.4 or later, as that includes some C/C++ frontend improvements, which may solve your issues. Note that this concerns less than 10% of source files being compiled, so you still should have a fairly complete database.

mies47 commented 8 months ago

It seems that there are indeed around 360 files on which we crash, as @smowton already suspected. These seem internal errors in the C/C++ frontend that we use. You might want to re-try with CodeQL 2.16.4 or later, as that includes some C/C++ frontend improvements, which may solve your issues. Note that this concerns less than 10% of source files being compiled, so you still should have a fairly complete database.

Thanks for your suggestion. I tried it again with CodeQL 2.16.5 and I think it worked better but the issue persists. I agree that it compiles most of the code base, but I'm trying to run a query on the IPC related files and those are the files that are missing. To confirm my guess that IPC related files are not getting completely imported I tried running a query to find the files with names that match "%Child.cpp" or "%Parent.cpp" and this is the results:

"PFetchParent.cpp"
"PBackgroundIDBCursorChild.cpp"
"PBackgroundIDBCursorParent.cpp"
"PBackgroundIDBDatabaseChild.cpp"
"PBackgroundIDBDatabaseFileChild.cpp"
"PBackgroundIDBDatabaseFileParent.cpp"
"PBackgroundIDBDatabaseParent.cpp"
"SessionStoreChild.cpp"
"SessionStoreParent.cpp"
"PSessionStoreChild.cpp"
"PSessionStoreParent.cpp"
"GPUParent.cpp"
"VRLayerParent.cpp"
"ContentChild.cpp"
"RemoteDecoderManagerChild.cpp"
"ActorsParent.cpp"
"HeapSnapshotTempFileHelperParent.cpp"
"TestShellChild.cpp"
"VsyncParent.cpp"
"VsyncMainChild.cpp"
"RemoteDecoderManagerParent.cpp"
"VsyncWorkerChild.cpp"
"RDDParent.cpp"
"RemoteDecoderChild.cpp"
"TestShellParent.cpp"
"ProxyAutoConfigParent.cpp"
"ProxyAutoConfigChild.cpp"
"RemoteDecoderParent.cpp"
"RDDChild.cpp"

Which is way lesser files than what it should be. Do you think there's a way I could fix this? I appreciate your help.

jketema commented 8 months ago

Could you share the build-tracer.log file from the 2.16.5 run, which should be located somewhere in the database directory? Thanks.

mies47 commented 8 months ago

Of course, here's the log file.

jketema commented 8 months ago

Thanks. It seems that the tooling indeed still crashes on the files in question. There are just under 400 CodeQL C++ extractor: Backtrace: lines in the build-tracer.log. The way to solve this is to somehow fabricate a small test case that reproduces the crash and does not depend on building Mozilla, based on such a test case we can likely produce a fix. However, given that most of the code is there, creating such a test case will have low priority on our side. If you're able/willing to create such a test case, then we might be able to do something, otherwise all we will do for now is track this problem internally.