aquasecurity / trivy-java-db

Apache License 2.0
31 stars 15 forks source link

FP in trivy because of confusing sha1 info in trivy java db #15

Open wagde-orca opened 1 year ago

wagde-orca commented 1 year ago

I see several issues in java db that all together causes a FP in trivy and need to be handled in the trivy-java-db and in trivy...

  1. in the latest fetched trivy java db I see that the latest version of kaml is missing in the db. version 0.53.0 was released on 18.3.2022. the question why is it missing... maybe because of next item
  2. as you can see https://repo1.maven.org/maven2/com/charleskorn/kaml/kaml/0.52.0/kaml-0.52.0.jar.sha1 and https://repo1.maven.org/maven2/com/charleskorn/kaml/kaml/0.53.0/kaml-0.53.0.jar.sha1. The two versions have the same sha1... I wonder how can this be.... and this is leading to next item
  3. in case we try to scan kaml-0.53.0.jar and because the only entry in the java db for the sha1 1464f167409b1df8aa89b1630f06036b71872b7a is for 0.52.0, and because the file does not have version info, we query the db for the sha1 and get version 0.52.0 and report CVE-2023-28118 for version 0.53.0 which was fixed in 0.53.0 as stated in https://nvd.nist.gov/vuln/detail/CVE-2023-28118
knqyf263 commented 1 year ago

I wonder if the digest is wrong. Weirdly, two versions have the same digest.

prabhu commented 1 year ago

@knqyf263, I told you guys on Twitter about this a few weeks ago :) I'm recreating this project to store additional metadata about the jar, such as package names, class names, etc. a better semantic comparison could be made to match packages. It's a fairly sizeable project, so collaborating would be nice.

knqyf263 commented 1 year ago

If the digests are the same, all the information such as the class names are the same, right? Storing additional metadata doesn't help, then. But it could help if digests are wrong and we cannot trust them.

prabhu commented 1 year ago

Just tested my tool and it worked correctly.

With kaml there was no difference since the jars are identical with a placeholder manifest file, so the sbom tool could rightly say kaml-0.53.0 is really kaml-0.52.0.

With kaml-jvm there is a subtle difference which got picked up.

❯ diff kaml-jvm-0.52.0.cpg.slices.json kaml-jvm-0.53.0.cpg.slices.json
155a156,171
>       "name" : "com.charleskorn.kaml.ForbiddenAnchorOrAliasException",
>       "fields" : [
>       ],
>       "procedures" : [
>         {
>           "callName" : "<init>",
>           "paramTypes" : [
>             "com.charleskorn.kaml.ForbiddenAnchorOrAliasException",
>             "java.lang.String",
>             "com.charleskorn.kaml.YamlPath"
>           ],
>           "returnType" : "void"
>         }
>       ]
>     },
>     {
1222a1239,1243
>         },
>         {
>           "name" : "allowAnchorsAndAliases",
>           "typeFullName" : "boolean",
>           "literal" : false
1324a1346,1352
>           "callName" : "getAllowAnchorsAndAliases$kaml",
>           "paramTypes" : [
>             "com.charleskorn.kaml.YamlConfiguration"
>           ],
>           "returnType" : "boolean"
>         },
>         {
1347a1376
>             "boolean",
1368c1397,1398
<             "int"
---
>             "int",
>             "boolean"
1428a1459,1465
>           "callName" : "component13$kaml",
>           "paramTypes" : [
>             "com.charleskorn.kaml.YamlConfiguration"
>           ],
>           "returnType" : "boolean"
>         },
>         {
1471c1508,1509
<             "int"
---
>             "int",
>             "boolean"
1490a1529
>             "boolean",
3397a3437,3441
>           "name" : "$path",
>           "typeFullName" : "com.charleskorn.kaml.YamlPath",
>           "literal" : false
>         },
>         {
3429a3474
>             "com.charleskorn.kaml.YamlPath",
3450a3496,3500
>           "name" : "allowAnchorsAndAliases",
>           "typeFullName" : "boolean",
>           "literal" : false
>         },
>         {
3614a3665,3671
>           "callName" : "access$getAllowAnchorsAndAliases$p",
>           "paramTypes" : [
>             "com.charleskorn.kaml.YamlNodeReader"
>           ],
>           "returnType" : "boolean"
>         },
>         {
3626c3683,3684
<             "java.lang.String"
---
>             "java.lang.String",
>             "boolean"
3635a3694
>             "boolean",

Commands and tools used to create this intermediate representation for comparison.

# Install cpggen from https://github.com/AppThreat/cpggen
cpggen -i "pkg:maven/com.charleskorn.kaml/kaml-jvm@0.52.0" --slice
cpggen -i "pkg:maven/com.charleskorn.kaml/kaml-jvm@0.53.0" --slice
prabhu commented 1 year ago

If the digests are the same, all the information such as the class names are the same, right? Storing additional metadata doesn't help, then. But it could help if digests are wrong and we cannot trust them.

The logic used would be what is the lowest version of this package that is semantically similar to the version referred and downloaded for this repo. So even if kaml-0.52.0 and kaml-0.53.0 have same hash and content, the sbom tool would consider it as kaml-0.52.0.

Further, hashes should never be trusted. Someone could copy the source code of kaml and republish it as a different jar. Or they might perform trivial transformations such as changing the package or class names alone. So if a vulnerability exists in "kaml" it would also exist in these derivations but would never get reported or identified with traditional sbom collection techniques.

knqyf263 commented 1 year ago

Further, hashes should never be trusted.

We can discuss it, but it seems to be off-topic in this issue. The original problem is kaml-0.52.0 and kaml-0.53.0 are the same, but the advisory says CVE-2023-28118 was fixed in 0.53.0. If kaml-jvm has differences, isn't GHSA wrong? It refers to com.charleskorn.kaml:kaml, but shouldn't it be com.charleskorn.kaml:kaml-jvm? https://mvnrepository.com/artifact/com.charleskorn.kaml/kaml-jvm

prabhu commented 1 year ago

It's one of those meta-packages where referring it would download the correct child package. https://github.com/charleskorn/kaml#referencing-kaml. GHSA could help by referring to kaml-jvm in addition to kaml, and there is a precedence with log4j-api already. But in this case, the issue is not false negatives but false positives due to tools relying only on hashes.

knqyf263 commented 1 year ago

But in this case, the issue is not false negatives but false positives due to tools relying only on hashes.

Why...?