github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.69k stars 1.54k forks source link

[cpp] How to extract only part of files when creating database #16237

Closed lianxv-primer closed 7 months ago

lianxv-primer commented 7 months ago

Our project involves many third-party libraries, what I concerned is the part we write. So I want to extract only party of files when creating database to reduce time costs and size of database.

My database creating command like : codeql database create C:\test\codeql-database --source-root "E:\test-project-code\src" --language=cpp --command="call build_win_codeql.bat" --threads=0 --verbose --overwrite --mode=clear --min-disk-free=100000

I read the docs that we can customize the behavior of extractors by setting extractor configuration. but the cpp extrator options may be none : "extractor_options" : { }

How can I optimize the database creating commands

jketema commented 7 months ago

Hi @lianxv-primer,

I would strongly recommend against doing anything like this, as some security problems may depend on the ability to analyse dataflow through some of your third-party libraries. This will no longer be possible when those libraries are not present in the database.

If you really do want to do this, and your build system supports incrementally rebuilding the source code, then you could try to attempt the following (note that we in no way support this):

  1. Build your code as you do normally
  2. Delete all object files, precompiled headers, libraries, and executables that relate to your own source code (keep the ones that relate to third-party libraries)
  3. Run codeql database create such that the supplied command only rebuilds the deleted files.
lianxv-primer commented 7 months ago

Hi @lianxv-primer,

I would strongly recommend against doing anything like this, as some security problems may depend on the ability to analyse dataflow through some of your third-party libraries. This will no longer be possible when those libraries are not present in the database.

If you really do want to do this, and your build system supports incrementally rebuilding the source code, then you could try to attempt the following (note that we in no way support this):

  1. Build your code as you do normally
  2. Delete all object files, precompiled headers, libraries, and executables that relate to your own source code (keep the ones that relate to third-party libraries)
  3. Run codeql database create such that the supplied command only rebuilds the deleted files.

ok , this sounds like a solution when have to do this. When I create databases, I get many errors like this:

[2024-04-17 00:06:20] Importing 96f07e7c3376ac5f473f4bab.trap (trace_log.cc.11991bd7_0.trap.tar.br) for no link target (6974499 of 7859950) [2024-04-17 00:06:20] [ERROR] dataset import> 9fafff7e4bf067c0c66b0ca7.trap (connection.cc.a0da4be2_0.trap.tar.br) for no link target, 38: com.semmle.util.exception.CatastrophicError: ID 94380083 is already mapped to 19293495 com.semmle.inmemory.util.DiskIdStore.append(DiskIdStore.java:63) com.semmle.inmemory.util.NonSequentialDiskPool.insert(NonSequentialDiskPool.java:105) com.semmle.inmemory.util.NonSequentialDiskPool.insertBucket(NonSequentialDiskPool.java:47) com.semmle.inmemory.util.DiskPool.getIdWithFreshness(DiskPool.java:171) com.semmle.inmemory.util.NonSequentialDiskPool.getIdWithFreshness(NonSequentialDiskPool.java:14) com.semmle.inmemory.populate.IdPool.lookupWithFreshness(IdPool.java:66) com.semmle.inmemory.trap.SynchronizedIdAllocator.getElementIdAndFreshness(SynchronizedIdAllocator.java:23) com.semmle.inmemory.trap.FreshIdAllocator.getElementId(FreshIdAllocator.java:22) com.semmle.inmemory.trap.TRAPReader$MetaStringBuilder.getElementId(TRAPReader.java:1016) com.semmle.inmemory.trap.TRAPReader.computeID(TRAPReader.java:1090) com.semmle.inmemory.trap.TRAPReader.computeID(TRAPReader.java:1085) com.semmle.inmemory.trap.TRAPReader.scanLabelKey(TRAPReader.java:829) com.semmle.inmemory.trap.TRAPReader.scanLabelValue(TRAPReader.java:801) com.semmle.inmemory.trap.TRAPReader.scanTuplesAndLabels(TRAPReader.java:505) com.semmle.inmemory.trap.TRAPReader.importTuples(TRAPReader.java:414) com.semmle.inmemory.trap.ImportTasksProcessor.process(ImportTasksProcessor.java:234) com.semmle.inmemory.trap.ImportTasksProcessor.lambda$importTrap$1(ImportTasksProcessor.java:154) com.semmle.util.concurrent.FutureUtils.lambda$mapAsync_$8(FutureUtils.java:161) java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source) java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) java.base/java.lang.Thread.run(Unknown Source) at (failed to read line: asked for -1 bytes which seems wrong!)

is it something wrong ?

jketema commented 7 months ago

This even happens when you delete the database directory, before doing the third step?

lianxv-primer commented 7 months ago

This even happens when you delete the database directory, before doing the third step?

No, directly rebuild all project. all build-trace like :

... ... [2024-04-16 11:42:00] [build-stdout] MSBuild version 17.9.8+b34f75857 for .NET Framework [2024-04-16 11:42:00] [build-stdout] Build started 4/16/2024 11:42:00 AM. ... [2024-04-16 21:04:28] [build-stdout] 4134 Warning(s) [2024-04-16 21:04:28] [build-stdout] 0 Error(s) [2024-04-16 21:04:28] [build-stdout] Time Elapsed 09:22:26.82 [2024-04-16 21:04:28] Plumbing command codeql database trace-command completed. [2024-04-16 21:04:28] [PROGRESS] database create> Finalizing database at C:\test\devops-codeql-database. [2024-04-16 21:04:28] Running plumbing command: codeql database finalize --threads=0 --mode=clear --min-disk-free=10000 --no-db-cluster -- C:\test\devops-codeql-database [2024-04-16 21:04:28] Using pre-finalize script C:\codeql-home\codeql\cpp\tools\pre-finalize.cmd. [2024-04-16 21:04:28] [PROGRESS] database finalize> Running pre-finalize script C:\codeql-home\codeql\cpp\tools\pre-finalize.cmd in C:\devops\p-6ac70a2931f74eb2a2452fb8f52372e1\src. [2024-04-16 21:04:28] Running plumbing command: codeql database trace-command --working-dir=C:\devops\p-6ac70a2931f74eb2a2452fb8f52372e1\src --no-tracing --threads=0 -- C:\test\devops-codeql-database C:\codeql-home\codeql\cpp\tools\pre-finalize.cmd [2024-04-16 21:04:28] [PROGRESS] database trace-command> Running command in C:\devops\p-6ac70a2931f74eb2a2452fb8f52372e1\src: [C:\codeql-home\codeql\cpp\tools\pre-finalize.cmd] [2024-04-16 21:04:28] Plumbing command codeql database trace-command completed. [2024-04-16 21:04:28] [PROGRESS] database finalize> Running TRAP import for CodeQL database at C:\test\devops-codeql-database... [2024-04-16 21:04:28] Running plumbing command: codeql dataset import --dbscheme=C:\codeql-home\codeql\cpp\semmlecode.cpp.dbscheme --threads=0 -- C:\test\devops-codeql-database\db-cpp C:\test\devops-codeql-database\trap\cpp [2024-04-16 21:04:28] Clearing disk cache since the version file C:\test\devops-codeql-database\db-cpp\default\cache\version does not exist [2024-04-16 21:04:29] Tuple pool not found. Clearing relations with cached strings [2024-04-16 21:04:29] Trimming disk cache at C:\test\devops-codeql-database\db-cpp\default\cache in mode clear. [2024-04-16 21:04:29] Sequence stamp origin is -6195435911585769745 [2024-04-16 21:04:29] Pausing evaluation to hard-clear memory at sequence stamp o+0 [2024-04-16 21:04:29] Unpausing evaluation [2024-04-16 21:04:29] Pausing evaluation to quickly trim disk at sequence stamp o+1 [2024-04-16 21:04:29] Unpausing evaluation [2024-04-16 21:04:29] Pausing evaluation to zealously trim disk at sequence stamp o+2 [2024-04-16 21:04:29] Unpausing evaluation [2024-04-16 21:04:29] Trimming completed (12ms): Purged everything. [2024-04-16 21:04:29] Scanning for files in C:\test\devops-codeql-database\trap\cpp [2024-04-16 21:05:01] Found 18594 files on disk containing 7859950 TRAP files (239.02 GiB) [2024-04-16 21:05:01] [PROGRESS] dataset import> Grouping TRAP files by link target [2024-04-16 21:11:37] [PROGRESS] dataset import> Grouping unlinked TRAP files together [2024-04-16 21:12:11] [PROGRESS] dataset import> Scanning TRAP files ... [2024-04-16 21:55:32] Scanning trace_log.cc.11991bd7.trap (trace_log.cc.11991bd7_0.trap.tar.br) (6891731 of 7859950) ... [2024-04-17 00:06:20] Importing 96f07e7c3376ac5f473f4bab.trap (trace_log.cc.11991bd7_0.trap.tar.br) for no link target (6974499 of 7859950) [2024-04-17 00:06:20] [ERROR] dataset import> 9fafff7e4bf067c0c66b0ca7.trap (connection.cc.a0da4be20.trap.tar.br) for no link target, 38: com.semmle.util.exception.CatastrophicError: ID 94380083 is already mapped to 19293495 com.semmle.inmemory.util.DiskIdStore.append(DiskIdStore.java:63) com.semmle.inmemory.util.NonSequentialDiskPool.insert(NonSequentialDiskPool.java:105) com.semmle.inmemory.util.NonSequentialDiskPool.insertBucket(NonSequentialDiskPool.java:47) com.semmle.inmemory.util.DiskPool.getIdWithFreshness(DiskPool.java:171) com.semmle.inmemory.util.NonSequentialDiskPool.getIdWithFreshness(NonSequentialDiskPool.java:14) com.semmle.inmemory.populate.IdPool.lookupWithFreshness(IdPool.java:66) com.semmle.inmemory.trap.SynchronizedIdAllocator.getElementIdAndFreshness(SynchronizedIdAllocator.java:23) com.semmle.inmemory.trap.FreshIdAllocator.getElementId(FreshIdAllocator.java:22) com.semmle.inmemory.trap.TRAPReader$MetaStringBuilder.getElementId(TRAPReader.java:1016) com.semmle.inmemory.trap.TRAPReader.computeID(TRAPReader.java:1090) com.semmle.inmemory.trap.TRAPReader.computeID(TRAPReader.java:1085) com.semmle.inmemory.trap.TRAPReader.scanLabelKey(TRAPReader.java:829) com.semmle.inmemory.trap.TRAPReader.scanLabelValue(TRAPReader.java:801) com.semmle.inmemory.trap.TRAPReader.scanTuplesAndLabels(TRAPReader.java:505) com.semmle.inmemory.trap.TRAPReader.importTuples(TRAPReader.java:414) com.semmle.inmemory.trap.ImportTasksProcessor.process(ImportTasksProcessor.java:234) com.semmle.inmemory.trap.ImportTasksProcessor.lambda$importTrap$1(ImportTasksProcessor.java:154) com.semmle.util.concurrent.FutureUtils.lambda$mapAsync$8(FutureUtils.java:161) java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source) java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) java.base/java.lang.Thread.run(Unknown Source) at (failed to read line: asked for -1 bytes which seems wrong!) ... ...

jketema commented 7 months ago

This even happens when you delete the database directory, before doing the third step?

No, directly rebuild all project.

I don't understand the answer. Did you delete the database directory immediately before running codeql database create? If not could you try that? I'd like be be sure the problems are not due to stale data in the database directory.

lianxv-primer commented 7 months ago

This even happens when you delete the database directory, before doing the third step?

No, directly rebuild all project.

I don't understand the answer. Did you delete the database directory immediately before running codeql database create? If not could you try that? I'd like be be sure the problems are not due to stale data in the database directory.

yes, I delete the database directory before running codeql database create. And I use --overwrite options.

jketema commented 7 months ago

Thanks for confirming. This means the approach is suggested apparently doesn't work in your case. Note that, as the approach is not supported, I cannot do much more for you here.

lianxv-primer commented 7 months ago

Thanks for confirming. This means the approach is suggested apparently doesn't work in your case. Note that, as the approach is not supported, I cannot do much more for you here.

I haven’t using the suggested method yet. The log above is the result of my full compilation yesterday.

jketema commented 7 months ago

I haven’t using the suggested method yet. The log above is the result of my full compilation yesterday.

Apologies. I misunderstood in that case. Could you open a new issue for that, so we can discuss that separately?

lianxv-primer commented 7 months ago

I haven’t using the suggested method yet. The log above is the result of my full compilation yesterday.

Apologies. I misunderstood in that case. Could you open a new issue for that, so we can discuss that separately?

sure! new issue address:https://github.com/github/codeql/issues/16239

jketema commented 7 months ago

Thanks!