Open wagde-orca opened 1 year ago
I wonder if the digest is wrong. Weirdly, two versions have the same digest.
@knqyf263, I told you guys on Twitter about this a few weeks ago :) I'm recreating this project to store additional metadata about the jar, such as package names, class names, etc. a better semantic comparison could be made to match packages. It's a fairly sizeable project, so collaborating would be nice.
If the digests are the same, all the information such as the class names are the same, right? Storing additional metadata doesn't help, then. But it could help if digests are wrong and we cannot trust them.
Just tested my tool and it worked correctly.
With kaml there was no difference since the jars are identical with a placeholder manifest file, so the sbom tool could rightly say kaml-0.53.0 is really kaml-0.52.0.
With kaml-jvm there is a subtle difference which got picked up.
❯ diff kaml-jvm-0.52.0.cpg.slices.json kaml-jvm-0.53.0.cpg.slices.json
155a156,171
> "name" : "com.charleskorn.kaml.ForbiddenAnchorOrAliasException",
> "fields" : [
> ],
> "procedures" : [
> {
> "callName" : "<init>",
> "paramTypes" : [
> "com.charleskorn.kaml.ForbiddenAnchorOrAliasException",
> "java.lang.String",
> "com.charleskorn.kaml.YamlPath"
> ],
> "returnType" : "void"
> }
> ]
> },
> {
1222a1239,1243
> },
> {
> "name" : "allowAnchorsAndAliases",
> "typeFullName" : "boolean",
> "literal" : false
1324a1346,1352
> "callName" : "getAllowAnchorsAndAliases$kaml",
> "paramTypes" : [
> "com.charleskorn.kaml.YamlConfiguration"
> ],
> "returnType" : "boolean"
> },
> {
1347a1376
> "boolean",
1368c1397,1398
< "int"
---
> "int",
> "boolean"
1428a1459,1465
> "callName" : "component13$kaml",
> "paramTypes" : [
> "com.charleskorn.kaml.YamlConfiguration"
> ],
> "returnType" : "boolean"
> },
> {
1471c1508,1509
< "int"
---
> "int",
> "boolean"
1490a1529
> "boolean",
3397a3437,3441
> "name" : "$path",
> "typeFullName" : "com.charleskorn.kaml.YamlPath",
> "literal" : false
> },
> {
3429a3474
> "com.charleskorn.kaml.YamlPath",
3450a3496,3500
> "name" : "allowAnchorsAndAliases",
> "typeFullName" : "boolean",
> "literal" : false
> },
> {
3614a3665,3671
> "callName" : "access$getAllowAnchorsAndAliases$p",
> "paramTypes" : [
> "com.charleskorn.kaml.YamlNodeReader"
> ],
> "returnType" : "boolean"
> },
> {
3626c3683,3684
< "java.lang.String"
---
> "java.lang.String",
> "boolean"
3635a3694
> "boolean",
Commands and tools used to create this intermediate representation for comparison.
# Install cpggen from https://github.com/AppThreat/cpggen
cpggen -i "pkg:maven/com.charleskorn.kaml/kaml-jvm@0.52.0" --slice
cpggen -i "pkg:maven/com.charleskorn.kaml/kaml-jvm@0.53.0" --slice
If the digests are the same, all the information such as the class names are the same, right? Storing additional metadata doesn't help, then. But it could help if digests are wrong and we cannot trust them.
The logic used would be what is the lowest version of this package that is semantically similar to the version referred and downloaded for this repo. So even if kaml-0.52.0 and kaml-0.53.0 have same hash and content, the sbom tool would consider it as kaml-0.52.0.
Further, hashes should never be trusted. Someone could copy the source code of kaml and republish it as a different jar. Or they might perform trivial transformations such as changing the package or class names alone. So if a vulnerability exists in "kaml" it would also exist in these derivations but would never get reported or identified with traditional sbom collection techniques.
Further, hashes should never be trusted.
We can discuss it, but it seems to be off-topic in this issue. The original problem is kaml-0.52.0 and kaml-0.53.0 are the same, but the advisory says CVE-2023-28118 was fixed in 0.53.0. If kaml-jvm has differences, isn't GHSA wrong? It refers to com.charleskorn.kaml:kaml, but shouldn't it be com.charleskorn.kaml:kaml-jvm? https://mvnrepository.com/artifact/com.charleskorn.kaml/kaml-jvm
It's one of those meta-packages where referring it would download the correct child package. https://github.com/charleskorn/kaml#referencing-kaml. GHSA could help by referring to kaml-jvm in addition to kaml, and there is a precedence with log4j-api
already. But in this case, the issue is not false negatives but false positives due to tools relying only on hashes.
But in this case, the issue is not false negatives but false positives due to tools relying only on hashes.
Why...?
I see several issues in java db that all together causes a FP in trivy and need to be handled in the trivy-java-db and in trivy...