Open prabhu opened 4 months ago
@prabhu,
Maven Central mostly relies on MD5
and SHA1
hashes. These are the only two algorithms required by OSSRH (cf. OSSRH Requirements) and in practice they are the only ones published to Maven Central. This is why CycloneDX Maven plugin computes the hashes itself.
If you observe wrong results in a multi-module Maven build, these are most probably due to #410, not concurrency problems.
A long-standing problem in multi-module Maven builds is that it is really hard to execute some task at the end of the build, so usually the aggregate SBOM is generated based on the previous snapshots.
I recommend against using the Maven Central/external hash. Several Java frameworks modify the .class files. To accurately identify these modified jars, it is advisable to use locally computed hashes instead.
@VinodAnandan, in that case the purl must be updated to represent the fact that such jars might be different to the one published in maven central.
I have tried to handle this case using evidence by comparing the generated hash with the one in the maven cache. Could you kindly test with my PR and let me know how it looks?
https://github.com/CycloneDX/cyclonedx-maven-plugin/pull/494
@prabhu Do you mean to capture that information similar to repository_url=localhost/.m2/ ? I believe we expect the default repository of Maven to be https://repo.maven.apache.org/maven2 . However, in many enterprise cases, the default repository will be their internal repositories (Nexus or JFrog) used by their build tool. @stevespringett, @pombredanne please correct me if I misunderstood. Perhaps the tool should compare the local JAR hashes with those at the actual download location. If they differ, then it could update the PURL repository URL to repository_url=localhost/.m2/ ?
@prabhu @stevespringett What are your thoughts on capturing modified change information in the pedigree part of the CycloneDX (https://cyclonedx.org/use-cases/#pedigree)? In a recent discussion with the Quarkus team (@aloubyansky), we (@nscuro and I) also discussed Quarkus modifying their JARs. I'm not sure if they have found a solution to capture these changes yet.
Yes, we have all the info, of course. It's a matter of properly manifesting it. I'll share some examples soon.
@prabhu,
Maven Central mostly relies on
MD5
andSHA1
hashes. These are the only two algorithms required by OSSRH (cf. OSSRH Requirements) and in practice they are the only ones published to Maven Central. This is why CycloneDX Maven plugin computes the hashes itself.If you observe wrong results in a multi-module Maven build, these are most probably due to #410, not concurrency problems.
A long-standing problem in multi-module Maven builds is that it is really hard to execute some task at the end of the build, so usually the aggregate SBOM is generated based on the previous snapshots.
This can be done by an implementing https://maven.apache.org/ref/3.5.0/apidocs/org/apache/maven/AbstractMavenLifecycleParticipant.html and configuring the plug-in as containing extensions.
There is no encoding specified, so not sure if this affects the data that gets read
a hash is always a hash of binary content: there never is encoding involved in hashing.
Noticed that the hashes are calculated based on the real file instead of using existing information from the maven cache.
https://github.com/CycloneDX/cyclonedx-maven-plugin/blob/51bde9e135ded7248e629ec6ebc609dc5d397a20/src/main/java/org/cyclonedx/maven/DefaultModelConverter.java#L164
Further, I believe there might be a few problems with the hashing logic used.
parallel
method is used which is anintermediate operation
as per the documentation.Is it possible that incorrect data might get read, if multiple build and file copy operations also happen in parallel?
I am not very knowledgeable in Java, so please correct me if I am wrong.