aboutcode-org / dejacode

Automate open source license compliance and ensure software supply chain integrity
https://dejacode.readthedocs.io
GNU Affero General Public License v3.0
25 stars 8 forks source link

BUG: Add Package constructs incorrect download URL from a GitHub purl #141

Closed mjherzog closed 4 months ago

mjherzog commented 4 months ago

Add Package for: pkg:github/StonyShi/reactor-netty-jersey@ac525d91ff1724395640531df08e3e4eabef207d returns Error: Could not download content: https://github.com/stonyshi/reactor-netty-jersey/archive/refs/tags/ac525d91ff1724395640531df08e3e4eabef207d.tar.gz

Expected DejaCode to use: https://github.com/StonyShi/reactor-netty-jersey/archive/ac525d91ff1724395640531df08e3e4eabef207d.tar.gz which is a valid download URL.

DejaCode version: v5.1.0-8-g9ba5da0 (DjC Enterprise)

tdruez commented 4 months ago

The Download URL is generated using the PackageURL library

The issue here is that each attribute of the purl is "normalized" to a lower case value. See https://github.com/package-url/packageurl-python/blob/main/src/packageurl/__init__.py#L133

At the time the Download URL is reconstructed from a purl, the case was lost:

>>> from packageurl.contrib import purl2url
>>> from packageurl import PackageURL

>>> PackageURL.from_string(purl)
PackageURL(type='github', namespace='stonyshi', name='reactor-netty-jersey', version='ac525d91ff1724395640531df08e3e4eabef207d', qualifiers={}, subpath=None)

>>> purl = "pkg:github/StonyShi/reactor-netty-jersey@ac525d91ff1724395640531df08e3e4eabef207d"
>>> purl2url.get_download_url(purl)
'https://github.com/stonyshi/reactor-netty-jersey/archive/refs/tags/ac525d91ff1724395640531df08e3e4eabef207d.tar.gz'

@pombredanne Any inputs on this behavior?

stefan6419846 commented 4 months ago

The lower-case value does not seem to be the issue here - AFAIK GitHub treats usernames in URLs case-insensitive. The original URLs differ in having a wrongly inserted refs/tags, although being a commit instead of a tag.

pombredanne commented 4 months ago

@tdruez this is a bug in the Package URL library then. When you use the GitHub UI for a tag, the URL generated for a tag and commit is different... but this does not have to be: https://github.com/nexB/scancode.io/archive/v34.7.0.tar.gz and https://github.com/nexB/scancode.io/archive/2ed734ee6339ce7dae8a00a9c9283f06987baf25.tar.gz work equally well and are generally reusable for any "commitish" reference.

tdruez commented 4 months ago

@stefan6419846 Thanks for noticing this, the refs/tags is the culprit here.

@pombredanne Thanks for the precisions, I guess we simply have to remove the refs/tags from https://github.com/package-url/packageurl-python/commit/2b7e68a862c8fb811bf97e51e88a263cb7134473#diff-2f9af27ff6ddefbfe1bd9790a393d43c19fd1f4c5325be411898a260517ca164R43 for maximum compatibility.

tdruez commented 4 months ago

Fixed in the packageurl-python library at https://github.com/package-url/packageurl-python/issues/157 Upgraded the library in DejaCode. Merged and deployed.