aboutcode-org / scancode-toolkit

:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ... to discover and inventory open source and third-party packages used in your code. Sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase, the Google Summer of Code, Azure credits, nexB and others generous sponsors!
https://aboutcode.org/scancode/
2.14k stars 552 forks source link

pkg:jar is returned instead of pkg:maven #3962

Open chinyeungli opened 1 month ago

chinyeungli commented 1 month ago

For the following JAR files:

converter-moshi-2.9.0.jar
retrofit-2.9.0.jar
unit-api-2.0.jar
jai_core-1.1.3.jar
jstl-1.2.jar
aspectjweaver-1.9.7.jar
postgresql-42.2.25.jar
spring-boot-2.5.14.jar
spring-boot-actuator-2.5.14.jar
spring-boot-actuator-autoconfigure-2.5.14.jar
spring-boot-autoconfigure-2.5.14.jar

The returned purl from SCIO scans are

pkg:jar/retrofit2.converter.moshi
pkg:jar/retrofit2
pkg:jar/javax.measure/Units%20of%20Measurement%20API@2.0
pkg:jar/javax.media.jai@1.1.3
pkg:jar/org.apache/Sun%20Java%20System%20Application%20Server@1.2
pkg:jar/org.aspectj.weaver
pkg:jar/org.postgresql/PostgreSQL%20JDBC%20Driver@42.2.25
pkg:jar/spring.boot@2.5.14
pkg:jar/spring.boot.actuator@2.5.14
pkg:jar/spring.boot.actuator.autoconfigure@2.5.14
pkg:jar/spring.boot.autoconfigure@2.5.14

However, there are appreciated maven purls that can be found (that's what I found from the web):

pkg:maven/com.squareup.retrofit2/converter-moshi@2.9.0
pkg:maven/com.squareup.retrofit2/retrofit@2.9.0
pkg:maven/javax.measure/unit-api@2.1.2
pkg:maven/javax.media/jai-core@1.1.3
pkg:maven/javax.servlet/jstl@1.2
pkg:maven/org.aspectj/aspectjweaver@1.9.7
pkg:maven/org.postgresql/postgresql@42.2.25
pkg:maven/org.springframework.boot/spring-boot@2.5.14
pkg:maven/org.springframework.boot/spring-boot-actuator@2.5.14
pkg:maven/org.springframework.boot/spring-boot-actuator-autoconfigure@2.5.14
pkg:maven/org.springframework.boot/spring-boot-autoconfigure@2.5.14

Why aren't the maven purls returned?

tdruez commented 3 weeks ago

Moving to the scancode-toolkit repo for discussion as the purl values are generated there.

extractcode converter-moshi-2.9.0.jar
scancode --json-pp - --package converter-moshi-2.9.0.jar-extract
"packages": [
    {
      "type": "jar",
      "namespace": null,
      "name": "retrofit2.converter.moshi",
      "version": null,
      "qualifiers": {},
      "subpath": null,
      "primary_language": null,
      "description": null,
      "release_date": null,
      "parties": [],
      "keywords": [],
      "homepage_url": null,
      "download_url": null,
      "size": null,
      "sha1": null,
      "md5": null,
      "sha256": null,
      "sha512": null,
      "bug_tracking_url": null,
      "code_view_url": null,
      "vcs_url": null,
      "copyright": null,
      "holder": null,
      "declared_license_expression": null,
      "declared_license_expression_spdx": null,
      "license_detections": [],
      "other_license_expression": null,
      "other_license_expression_spdx": null,
      "other_license_detections": [],
      "extracted_license_statement": null,
      "notice_text": null,
      "source_packages": [],
      "is_private": false,
      "is_virtual": false,
      "extra_data": {},
      "repository_homepage_url": null,
      "repository_download_url": null,
      "api_data_url": null,
      "package_uid": "pkg:jar/retrofit2.converter.moshi?uuid=8b522553-e548-4552-9b36-fcd234529882",
      "datafile_paths": [
        "converter-moshi-2.9.0.jar-extract/META-INF/MANIFEST.MF"
      ],
      "datasource_ids": [
        "java_jar_manifest"
      ],
      "purl": "pkg:jar/retrofit2.converter.moshi"
    }
  ],

See the "purl": "pkg:jar/retrofit2.converter.moshi"

AyanSinhaMahapatra commented 2 weeks ago
pkg:jar/retrofit2.converter.moshi
pkg:jar/retrofit2
pkg:jar/javax.measure/Units%20of%20Measurement%20API@2.0
pkg:jar/javax.media.jai@1.1.3
pkg:jar/org.apache/Sun%20Java%20System%20Application%20Server@1.2
pkg:jar/org.aspectj.weaver
pkg:jar/org.postgresql/PostgreSQL%20JDBC%20Driver@42.2.25
pkg:jar/spring.boot@2.5.14
pkg:jar/spring.boot.actuator@2.5.14
pkg:jar/spring.boot.actuator.autoconfigure@2.5.14
pkg:jar/spring.boot.autoconfigure@2.5.14

MANIFEST.MF files are used in a lot of different java cases, like gradle, osgi and else. If you see the code at https://github.com/aboutcode-org/scancode-toolkit/blob/d23d1200c44b15e6f817a5a933c05f5238540e2a/src/packagedcode/jar_manifest.py#L198 in the function get_normalized_java_manifest_data we see how based on cases we assign different namespace, as all of these cases we have the same MANIFEST.MF.

For example in the case of pkg:jar/retrofit2.converter.moshi: from https://repo1.maven.org/maven2/com/squareup/retrofit2/converter-moshi/2.9.0/converter-moshi-2.9.0.jar The entire contents of converter-moshi-2.9.0.jar-extract/META-INF/MANIFEST.MF is:

Manifest-Version: 1.0
Automatic-Module-Name: retrofit2.converter.moshi

which does not have reference to maven or it's maven namespace so there is no way to connect this to https://mvnrepository.com/artifact/com.squareup.retrofit2/converter-moshi/2.9.0 with the com.squareup.retrofit2 namespace.

We could script something extra to automate lookup the package name in maven, get download urls and compare on the side to transform these purls to maven purls, but otherwise this is out of scope for the scanner as the information to create valid maven purls is simply not on the manifests, so we cannot create these maven purls from the manifests.