aboutcode-org / purldb

Tools to create and expose a database of purls (Package URLs). This project is sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase/ and nexB for https://www.aboutcode.org/ Chat is at https://gitter.im/aboutcode-org/discuss
https://purldb.readthedocs.io/
35 stars 23 forks source link

Create PURL services CLI tool and library #247

Closed pombredanne closed 7 months ago

pombredanne commented 11 months ago

To best support using various PURL-based services, I would like to have a command client tool and library as a client API that can expose these services for integration elsewhere.

johnmhoran commented 9 months ago

@pombredanne Here's a manual mockup. Is this what you have in mind for the restructured meta output? And what about the outputs for the other 3 current commands?

This needs to be a list to handle multiple input PURLs. (For now, I left the nested purl where it is in the output we get from fetchcode/package.py info() -- do you want me to try to move that field to the top from code inside the purlcli.py meta command?)

[
    {
        "headers": [
            {
                "purl": "pkg:pypi/fetchcode"
            }
        ],
        "packages": [
            {
                "metadata": [
                    {
                        "type": "pypi",
                        "namespace": null,
                        "name": "fetchcode",
                        "version": null,
                        "qualifiers": {},
                        "subpath": null,
                        "primary_language": null,
                        "description": null,
                        "release_date": null,
                        "parties": [],
                        "keywords": [],
                        "homepage_url": "https://github.com/nexB/fetchcode",
                        "download_url": null,
                        "api_url": "https://pypi.org/pypi/fetchcode/json",
                        "size": null,
                        "sha1": null,
                        "md5": null,
                        "sha256": null,
                        "sha512": null,
                        "bug_tracking_url": null,
                        "code_view_url": null,
                        "vcs_url": null,
                        "copyright": null,
                        "license_expression": null,
                        "declared_license": "Apache-2.0",
                        "notice_text": null,
                        "root_path": null,
                        "dependencies": [],
                        "contains_source_code": null,
                        "source_packages": [],
                        "purl": "pkg:pypi/fetchcode",
                        "repository_homepage_url": null,
                        "repository_download_url": null,
                        "api_data_url": null
                    }
                    // {additional dictionaries -- 1 for each version of the package}
                ]
            }
        ]
    }
]
pombredanne commented 9 months ago

So the "metadata" should be called "packages" and in general we want the same output structure as scancode-toolkit. The input purls should be in a "header". We need a bit more design on the URLs, but the urls should be a single mapping, like the subset of the packages and not a list of mappings.

johnmhoran commented 9 months ago

@pombredanne I am now lost re the structure and content of the meta output. This is what fetchcode info() produces. Do we want something different? If so, what?

the "metadata" should be called "packages" -- but my mockup is taken directly from SCTK output (or is it dated and has that changed again?) headers and packages are the 2 top level fields for a scan and so would be the same for EACH input PURL, right?

in general we want the same output structure as scancode-toolkit -- that's what this mockup is. What do you mean?

johnmhoran commented 9 months ago

The input purls should be in a "header". -- they already are in this mockup.

pombredanne commented 9 months ago

@johnmhoran so for the confusion, we crossed path.... In https://github.com/nexB/purldb/issues/247#issuecomment-1906630287 I replied to https://github.com/nexB/purldb/issues/247#issuecomment-1906429880 and to your latest mockup https://github.com/nexB/purldb/issues/247#issuecomment-1906606993

johnmhoran commented 9 months ago

Unfortunately that does not clarify or respond to my comments/questions.

johnmhoran commented 9 months ago

I am not going to attempt to change any current data structure or content until we can clarify what we want. I will try now to access the packageurl-python purl2url code, run pip install -e . there as @JonoYang explained yesterday (at least that's my understanding), make some edits there, and see if I can access that new purl2url code from my current purldb branch where my PURL CLI code lives.

pombredanne commented 9 months ago

@johnmhoran re: https://github.com/nexB/purldb/issues/247#issuecomment-1906606993

Do not use nesting in a list and do not further nest in metadata. Instead use something like this for the URLs, which is the same structure as ScanCode TK.

And also for the metadata (just there is more with metadata). With URLs, if there is an option to validate the URLs exist, then you could not return anything if they do not and return error messages too.

Adopting the same structure means that tools that know scancode format will support this too.

{
  "headers": [
    {
      "tool_name": "purlcli",
      "tool_version": "v32.0.8-156-g7b867f3bec",
      "options": {
        "command": "meta",
        "--purl": [
          "pkg:pypi/scancode-toolkit@2.0.0",
          "pkg:pypi/scancode-toolkit@3.0.0"
        ],
        "--output": "foo.json"
      },
      "errors": [],
      "warnings": []
    }
  ],
  "packages": [
    {
      "purl": "pkg:pypi/scancode-toolkit@32.0.8",
      "homepage_url": "https://github.com/nexB/scancode-toolkit",
      "download_url": null,
      "bug_tracking_url": null,
      "code_view_url": null,
      "vcs_url": null,
      "repository_homepage_url": null,
      "repository_download_url": null,
      "api_data_url": null
    },
    {
      "purl": "pkg:pypi/scancode-toolkit@2.0.8",
      "homepage_url": "https://github.com/nexB/scancode-toolkit",
      "download_url": null,
      "bug_tracking_url": null,
      "code_view_url": null,
      "vcs_url": null,
      "repository_homepage_url": null,
      "repository_download_url": null,
      "api_data_url": null
    }
  ]
}
johnmhoran commented 9 months ago

@pombredanne Are you referring now to the new urls command, or to the meta command, which is what we've been discussing most recently? And are you saying you want this structure for all 4 current command outputs, or just meta? or just urls -- the content you list is URLs but not the meta content, as you can see from my examples above of meta content.

pombredanne commented 9 months ago

Are you referring now to the new urls command, or to the meta command, which is what we've been discussing most recently? And are you saying you want this structure for all 4 current command outputs, or just meta? or just urls -- the content you list is URLs but not the meta content, as you can see from my examples above of meta content.

Yes, for meta, urls, versions except for validate which is special, but should still adopt a similar output

johnmhoran commented 9 months ago

@JonoYang I ran pip install -e . in my new packageurl-python repo branch, added a simple print function at the bottom of the purl2url.py file, and tried to call it from my purldb purlcli.py urls command -- no dice. Not sure how I might have strayed from your guidance of yesterday.

(venv) Tue Jan 23, 2024 10:58 AM  /home/jmh/dev/nexb/purldb jmh (247-purl-cli-add-urls)
$ python -m purldb_toolkit.purlcli urls --purl pkg:pypi/fetchcode@0.1.0 --output -
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/jmh/dev/nexb/purldb/purldb-toolkit/src/purldb_toolkit/purlcli.py", line 402, in <module>
    purlcli()
  File "/home/jmh/dev/nexb/purldb/venv/lib/python3.8/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/jmh/dev/nexb/purldb/venv/lib/python3.8/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/jmh/dev/nexb/purldb/venv/lib/python3.8/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/jmh/dev/nexb/purldb/venv/lib/python3.8/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/jmh/dev/nexb/purldb/venv/lib/python3.8/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/jmh/dev/nexb/purldb/purldb-toolkit/src/purldb_toolkit/purlcli.py", line 256, in get_urls
    purl_urls = get_url_details(purls)
  File "/home/jmh/dev/nexb/purldb/purldb-toolkit/src/purldb_toolkit/purlcli.py", line 332, in get_url_details
    test_print = purl2url.print_hello(purl)
AttributeError: module 'packageurl.contrib.purl2url' has no attribute 'print_hello'

(venv) Tue Jan 23, 2024 10:58 AM  /home/jmh/dev/nexb/purldb jmh (247-purl-cli-add-urls)
$
johnmhoran commented 9 months ago

@JonoYang In case I misinterpreted your guidance, I also tried to run pip install -e . in the purldb venv packageurl directory, but got a different error.

(venv) Tue Jan 23, 2024 11:24 AM  /home/jmh/dev/nexb/purldb/venv/lib/python3.8/site-packages jmh (247-purl-cli-add-urls)
$ cd packageurl

(venv) Tue Jan 23, 2024 11:24 AM  /home/jmh/dev/nexb/purldb/venv/lib/python3.8/site-packages/packageurl jmh (247-purl-cli-add-urls)
$ ll
total 48
drwxr-xr-x   4 jmh jmh  4096 2023-12-18 17:48:55.241376800 -0800 ./
drwxr-xr-x 255 jmh jmh 16384 2024-01-18 12:12:22.245371200 -0800 ../
-rw-r--r--   1 jmh jmh 17351 2023-12-18 17:48:55.231376800 -0800 __init__.py
drwxr-xr-x   2 jmh jmh  4096 2023-12-18 17:50:43.351376800 -0800 __pycache__/
drwxr-xr-x   5 jmh jmh  4096 2023-12-18 17:48:55.241376800 -0800 contrib/
-rw-r--r--   1 jmh jmh     0 2023-12-18 17:48:55.231376800 -0800 py.typed

(venv) Tue Jan 23, 2024 11:24 AM  /home/jmh/dev/nexb/purldb/venv/lib/python3.8/site-packages/packageurl jmh (247-purl-cli-add-urls)
$ pip install -e .
Obtaining file:///home/jmh/dev/nexb/purldb/venv/lib/python3.8/site-packages/packageurl
ERROR: file:///home/jmh/dev/nexb/purldb/venv/lib/python3.8/site-packages/packageurl does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found.

[notice] A new release of pip available: 22.2.2 -> 23.3.2
[notice] To update, run: pip install --upgrade pip

(venv) Tue Jan 23, 2024 11:28 AM  /home/jmh/dev/nexb/purldb/venv/lib/python3.8/site-packages/packageurl jmh (247-purl-cli-add-urls)
$

My purldb repo venv does not contain a packageurl-python directory.

image

But I can access purl2url with this:

from packageurl.contrib import purl2url

Do I need to install packageurl-python in that venv, and then from that new venv directory run pip install -e .?

johnmhoran commented 9 months ago

Since I'm unable to access my test function in the local packageurl-python repo from my work in the local purldb repo, I will put the purl2url work aside -- will need assistance on figuring this out. Meanwhile, will turn to restructuring the output of the 4 existing commands per @pombredanne 's significant restructuring requests of this morning.

JonoYang commented 9 months ago

@johnmhoran Lets have a session regarding the venv stuff

johnmhoran commented 9 months ago

@pombredanne @JonoYang I'm making progress on restructuring the JSON output, starting with the meta command. When the PURLs are identified in the command itself with one or more --purl flags, the JSON now lists them. When they are submitted instead in a file, our JSON currently looks like this excerpt:

{
    "headers": [
        {
            "tool_name": "purlcli",
            "tool_version": "___",
            "options": {
                "command": "meta",
                "--purl": [],
                "--file": "/mnt/c/nexb/purldb-testing/2024-current-01-testing/txt-input/2024-01-23-purl-meta-input-01.txt",
                "--output": "/mnt/c/nexb/purldb-testing/2024-current-01-testing/json-output/2024-01-24-meta--output-01.json"
            },
            "errors": [],
            "warnings": [
                "There was an error with your 'pkg:pypi/matchcode' query.  Make sure that 'pkg:pypi/matchcode' actually exists in the relevant repository."
            ]
        }
    ],
    "packages": [
. . .

In this case, do we want the JSON to include a list of the PURLs contained in the input file? If so, this could be reported in the options section, perhaps something like this immediately below the --file key-value pair:

            "options": {
                "command": "meta",
                "--purl": [],
                "--file": "/mnt/c/nexb/purldb-testing/2024-current-01-testing/txt-input/2024-01-23-purl-meta-input-01.txt",
                "--file_purls": [
                    "pkg:pypi/fetchcode",
                    "pkg:pypi/matchcode",
                    "pkg:pypi/minecode"
                ],
                "--output": "/mnt/c/nexb/purldb-testing/2024-current-01-testing/json-output/2024-01-24-meta--output-01.json"
            },

However, that doesn't look correct since --file_purls is not actually an option. What do you think?

JonoYang commented 9 months ago

@johnmhoran Maybe something like this, where we have a list of all purls passed into the command, outside of options

{
    "headers": [
        {
            "tool_name": "purlcli",
            "tool_version": "___",
            "options": {
                "command": "meta",
                "--purl": [],
                "--file": "/mnt/c/nexb/purldb-testing/2024-current-01-testing/txt-input/2024-01-23-purl-meta-input-01.txt",
                "--output": "/mnt/c/nexb/purldb-testing/2024-current-01-testing/json-output/2024-01-24-meta--output-01.json"
            },
            "purls": [
                    "pkg:pypi/fetchcode",
                    "pkg:pypi/matchcode",
                    "pkg:pypi/minecode"
            ],
            "errors": [],
            "warnings": [
                "There was an error with your 'pkg:pypi/matchcode' query.  Make sure that 'pkg:pypi/matchcode' actually exists in the relevant repository."
            ]
        }
    ],
    ...
johnmhoran commented 9 months ago

Thanks @JonoYang . That would definitely work. There'd be some redundant data in that case since I think we'd also want to populate that list when there is no file and the PURLs are identified with the --purl flag(s) and thus are also listed in the --purl value under options. In any event, thanks for a good solution.

johnmhoran commented 9 months ago

@pombredanne @JonoYang I just pushed a commit (and a second fixer-upper) and opened a new PR -- https://github.com/nexB/purldb/pull/281.

johnmhoran commented 9 months ago

@JonoYang Does this give me a list of the current PURL types supported in the API validate endpoint?

from packagedb.package_managers import VERSION_API_CLASSES_BY_PACKAGE_TYPE

. . .

    for k, v in VERSION_API_CLASSES_BY_PACKAGE_TYPE.items():
        print(f"{k} = {v}")

The output:

gem = <class 'packagedb.package_managers.RubyVersionAPI'>
hex = <class 'packagedb.package_managers.HexVersionAPI'>
cargo = <class 'packagedb.package_managers.CratesVersionAPI'>
composer = <class 'packagedb.package_managers.ComposerVersionAPI'>
pypi = <class 'packagedb.package_managers.PypiVersionAPI'>
nuget = <class 'packagedb.package_managers.NugetVersionAPI'>
deb = <class 'packagedb.package_managers.DebianVersionAPI'>
maven = <class 'packagedb.package_managers.MavenVersionAPI'>
npm = <class 'packagedb.package_managers.NpmVersionAPI'>
golang = <class 'packagedb.package_managers.GoproxyVersionAPI'>
johnmhoran commented 9 months ago

Trying to dig into the details a bit further, this command

python -m purldb_toolkit.purlcli validate --purl pkg:cargo/rand --purl pkg:composer/uuid --purl pkg:deb/2ping --purl pkg:gem/small_wonder --purl pkg:golang/github.com/golang/glog --purl pkg:hex/zzz --purl pkg:maven/com.google.appengine/appengine-tools-sdk --purl pkg:npm/abbrev --purl pkg:nuget/log4net --purl pkg:pypi/dejacode --purl pkg:rubygems/bundler-sass --purl pkg:ubuntu/zzz --purl pkg:jmh/zzz --output -

gives me the following output and states that each PURL is valid but that check_existence is not supported for pkg:rubygems/bundler-sass, pkg:ubuntu/zzz or pkg:jmh/zzz.

I expected no support for pkg:jmh but was not sure what to expect for pkg:rubygems or pkg:ubuntu (guessing the latter is handled under pkg:deb).

(venv) Mon Feb 05, 2024 01:10 PM  /home/jmh/dev/nexb/purldb jmh (247-purl-cli-add-urls)
$ python -m purldb_toolkit.purlcli validate --purl pkg:cargo/rand --purl pkg:composer/uuid --purl pkg:deb/2ping --purl pkg:gem/small_wonder --purl pkg:golang/github.com/golang/glog --purl pkg:hex/zzz --purl pkg:maven/com.google.appengine/appengine-tools-sdk --purl pkg:npm/abbrev --purl pkg:nuget/log4net --purl pkg:pypi/dejacode --purl pkg:rubygems/bundler-sass --purl pkg:ubuntu/zzz --purl pkg:jmh/zzz --output -

VERSION_API_CLASSES_BY_PACKAGE_TYPE = {'composer': <class 'packagedb.package_managers.ComposerVersionAPI'>, 'pypi': <class 'packagedb.package_managers.PypiVersionAPI'>, 'nuget': <class 'packagedb.package_managers.NugetVersionAPI'>, 'deb': <class 'packagedb.package_managers.DebianVersionAPI'>, 'maven': <class 'packagedb.package_managers.MavenVersionAPI'>, 'npm': <class 'packagedb.package_managers.NpmVersionAPI'>, 'golang': <class 'packagedb.package_managers.GoproxyVersionAPI'>, 'gem': <class 'packagedb.package_managers.RubyVersionAPI'>, 'hex': <class 'packagedb.package_managers.HexVersionAPI'>, 'cargo': <class 'packagedb.package_managers.CratesVersionAPI'>}

composer = <class 'packagedb.package_managers.ComposerVersionAPI'>
pypi = <class 'packagedb.package_managers.PypiVersionAPI'>
nuget = <class 'packagedb.package_managers.NugetVersionAPI'>
deb = <class 'packagedb.package_managers.DebianVersionAPI'>
maven = <class 'packagedb.package_managers.MavenVersionAPI'>
npm = <class 'packagedb.package_managers.NpmVersionAPI'>
golang = <class 'packagedb.package_managers.GoproxyVersionAPI'>
gem = <class 'packagedb.package_managers.RubyVersionAPI'>
hex = <class 'packagedb.package_managers.HexVersionAPI'>
cargo = <class 'packagedb.package_managers.CratesVersionAPI'>

[
    {
        "valid": true,
        "exists": true,
        "message": "The provided Package URL is valid, and the package exists in the upstream repo.",
        "purl": "pkg:cargo/rand"
    },
    {
        "valid": true,
        "exists": true,
        "message": "The provided Package URL is valid, and the package exists in the upstream repo.",
        "purl": "pkg:composer/uuid"
    },
    {
        "valid": true,
        "exists": true,
        "message": "The provided Package URL is valid, and the package exists in the upstream repo.",
        "purl": "pkg:deb/2ping"
    },
    {
        "valid": true,
        "exists": true,
        "message": "The provided Package URL is valid, and the package exists in the upstream repo.",
        "purl": "pkg:gem/small_wonder"
    },
    {
        "valid": true,
        "exists": true,
        "message": "The provided Package URL is valid, and the package exists in the upstream repo.",
        "purl": "pkg:golang/github.com/golang/glog"
    },
    {
        "valid": true,
        "exists": false,
        "message": "The provided PackageURL is valid, but does not exist in the upstream repo.",
        "purl": "pkg:hex/zzz"
    },
    {
        "valid": true,
        "exists": true,
        "message": "The provided Package URL is valid, and the package exists in the upstream repo.",
        "purl": "pkg:maven/com.google.appengine/appengine-tools-sdk"
    },
    {
        "valid": true,
        "exists": true,
        "message": "The provided Package URL is valid, and the package exists in the upstream repo.",
        "purl": "pkg:npm/abbrev"
    },
    {
        "valid": true,
        "exists": true,
        "message": "The provided Package URL is valid, and the package exists in the upstream repo.",
        "purl": "pkg:nuget/log4net"
    },
    {
        "valid": true,
        "exists": true,
        "message": "The provided Package URL is valid, and the package exists in the upstream repo.",
        "purl": "pkg:pypi/dejacode"
    },
    {
        "valid": true,
        "exists": null,
        "message": "The provided PackageURL is valid, but `check_existence` is not supported for this package type.",
        "purl": "pkg:rubygems/bundler-sass"
    },
    {
        "valid": true,
        "exists": null,
        "message": "The provided PackageURL is valid, but `check_existence` is not supported for this package type.",
        "purl": "pkg:ubuntu/zzz"
    },
    {
        "valid": true,
        "exists": null,
        "message": "The provided PackageURL is valid, but `check_existence` is not supported for this package type.",
        "purl": "pkg:jmh/zzz"
    }
]
(venv) Mon Feb 05, 2024 01:19 PM  /home/jmh/dev/nexb/purldb jmh (247-purl-cli-add-urls)
$
johnmhoran commented 9 months ago

The meta command relies on fetchcode/package.py, which includes this router -- @router.route("pkg:rubygems/.*") -- suggesting the pkg:rubygems is supported.

The urls command relies on packageurl-python/src/packageurl/contrib/purl2url.py and includes both @repo_router.route("pkg:(gem|rubygems)/.*") and @download_router.route("pkg:(gem|rubygems)/.*"), suggesting that pkg:rubygems (and pkg:gems) is supported.

I suppose the explanation is that these packages are related and perhaps use only the gems type, but if that's documented or explained somewhere, I have yet to find it.

JonoYang commented 9 months ago

Does this give me a list of the current PURL types supported in the API validate endpoint?

I think it should show you all the package types that it can look up, but @keshav-space can tell you more.

...check_existence is not supported for pkg:rubygems/bundler-sass, pkg:ubuntu/zzz or pkg:jmh/zzz.

My guess is that on the purldb side, we didn't associate the rubygems with packagedb.package_managers.RubyVersionAPI in VERSION_API_CLASSES_BY_PACKAGE_TYPE, so that's why it says check_existence isn't supported.

I suppose the explanation is that these packages are related and perhaps use only the gems type, but if that's documented or explained somewhere, I have yet to find it.

The package-url repo has the specs for purl and the package types: https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst

johnmhoran commented 9 months ago

Thanks @JonoYang . I've visited the spec page many times but it does not explain the references to both pkg:gem and pkg:rubygems since there it has only the former. A bit confusing, or incompelete.

johnmhoran commented 9 months ago

and https://github.com/package-url/packageurl-python includes a handful of references to pkg:rubygems:

>>> from packageurl.contrib import purl2url

>>> purl2url.get_repo_url("pkg:rubygems/bundler@2.3.23")
"https://rubygems.org/gems/bundler/versions/2.3.23"

>>> purl2url.get_download_url("pkg:rubygems/bundler@2.3.23")
"https://rubygems.org/downloads/bundler-2.3.23.gem"

>>> purl2url.get_inferred_urls("pkg:rubygems/bundler@2.3.23")
["https://rubygems.org/gems/bundler/versions/2.3.23", "https://rubygems.org/downloads/bundler-2.3.23.gem",]
keshav-space commented 9 months ago

Does this give me a list of the current PURL types supported in the API validate endpoint?

I think it should show you all the package types that it can look up, but @keshav-space can tell you more.

@johnmhoran for Rubygems we use gem as a type see here https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#gem

GET /api/validate/?purl=pkg:gem/bundler-sass&check_existence=true

HTTP 200 OK
Allow: GET, HEAD, OPTIONS
Content-Type: application/json
Vary: Accept

{
    "valid": true,
    "exists": true,
    "message": "The provided Package URL is valid, and the package exists in the upstream repo.",
    "purl": "pkg:gem/bundler-sass"
}

For ubuntu package we use deb as type and ubuntu as namespace more here https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#deb

GET /api/validate/?purl=pkg:deb/ubuntu/curl&check_existence=true

HTTP 200 OK
Allow: GET, HEAD, OPTIONS
Content-Type: application/json
Vary: Accept

{
    "valid": true,
    "exists": true,
    "message": "The provided Package URL is valid, and the package exists in the upstream repo.",
    "purl": "pkg:deb/ubuntu/curl"
}
johnmhoran commented 9 months ago

Thank you @keshav-space . In my validate command (which accesses the validate endpoint), I noticed that both pkg:deb/2ping and pkg:deb/debian/2ping succeed, while in the versions command (which calls versions() in fetchcode/package_versions.py), pkg:deb/2ping returns None (I catch that with an if and print a warning) while pkg:deb/debian/2ping returns a list of versions.

(venv) Tue Feb 06, 2024 07:49 AM  /home/jmh/dev/nexb/purldb jmh (247-purl-cli-add-urls)
$ python -m purldb_toolkit.purlcli validate --purl pkg:deb/2ping --purl pkg:deb/debian/2ping --output -

[
    {
        "valid": true,
        "exists": true,
        "message": "The provided Package URL is valid, and the package exists in the upstream repo.",
        "purl": "pkg:deb/2ping"
    },
    {
        "valid": true,
        "exists": true,
        "message": "The provided Package URL is valid, and the package exists in the upstream repo.",
        "purl": "pkg:deb/debian/2ping"
    }
]
(venv) Tue Feb 06, 2024 07:50 AM  /home/jmh/dev/nexb/purldb jmh (247-purl-cli-add-urls)
$ python -m purldb_toolkit.purlcli versions --purl pkg:deb/2ping --purl pkg:deb/debian/2ping --output -
There was an error with your 'pkg:deb/2ping' query.  Make sure that 'pkg:deb/2ping' actually exists in the relevant repository.
[
    {
        "purl": "pkg:deb/debian/2ping",
        "versions": [
            {
                "purl": "pkg:deb/debian/2ping@4.5-1.2",
                "version": "4.5-1.2",
                "release_date": "None"
            },
            {
                "purl": "pkg:deb/debian/2ping@4.5-1.1",
                "version": "4.5-1.1",
                "release_date": "None"
            },
            {
                "purl": "pkg:deb/debian/2ping@4.5-1",
                "version": "4.5-1",
                "release_date": "None"
            },
            {
                "purl": "pkg:deb/debian/2ping@4.3-1",
                "version": "4.3-1",
                "release_date": "None"
            },
            {
                "purl": "pkg:deb/debian/2ping@3.2.1-1+deb9u1",
                "version": "3.2.1-1+deb9u1",
                "release_date": "None"
            },
            {
                "purl": "pkg:deb/debian/2ping@2.1.1-1",
                "version": "2.1.1-1",
                "release_date": "None"
            },
            {
                "purl": "pkg:deb/debian/2ping@2.0-1",
                "version": "2.0-1",
                "release_date": "None"
            }
        ]
    }
]
(venv) Tue Feb 06, 2024 07:50 AM  /home/jmh/dev/nexb/purldb jmh (247-purl-cli-add-urls)
$
johnmhoran commented 9 months ago

@keshav-space Re pkg:gem/ vs. pkg:rubygems/, I've noticed that my meta command (which calls info() in fetchcode/package.py) supports rubygems but not gem.

(venv) Tue Feb 06, 2024 08:11 AM  /home/jmh/dev/nexb/purldb jmh (247-purl-cli-add-urls)
$ python -m purldb_toolkit.purlcli meta --purl pkg:gem/bundler-sass --purl pkg:rubygems/bundler-sass --output -

The provided PackageURL 'pkg:gem/bundler-sass' is valid, but `meta` is not supported for this package type.
{
    "headers": [
        {
            "tool_name": "purlcli",
            "tool_version": "0.0.1",
            "options": {
                "command": "meta",
                "--purl": [
                    "pkg:gem/bundler-sass",
                    "pkg:rubygems/bundler-sass"
                ],
                "--file": null,
                "--output": "<stdout>"
            },
            "purls": [
                "pkg:gem/bundler-sass",
                "pkg:rubygems/bundler-sass"
            ],
            "errors": [],
            "warnings": [
                "The provided PackageURL 'pkg:gem/bundler-sass' is valid, but `meta` is not supported for this package type."
            ]
        }
    ],
    "packages": [
        {
            "purl": "pkg:rubygems/bundler-sass",
            "type": "rubygems",
            "namespace": null,
            "name": "bundler-sass",
            "version": null,
            "qualifiers": {},
            "subpath": null,
            "primary_language": null,
            "description": null,
            "release_date": null,
            "parties": [],
            "keywords": [],
            "homepage_url": "http://github.com/vogelbek/bundler-sass",
            "download_url": "https://rubygems.org/gems/bundler-sass-0.1.2.gem",
            "api_url": "https://rubygems.org/api/v1/gems/bundler-sass.json",
            "size": null,
            "sha1": null,
            "md5": null,
            "sha256": null,
            "sha512": null,
            "bug_tracking_url": null,
            "code_view_url": null,
            "vcs_url": null,
            "copyright": null,
            "license_expression": null,
            "declared_license": [
                "MIT"
            ],
            "notice_text": null,
            "root_path": null,
            "dependencies": [],
            "contains_source_code": null,
            "source_packages": [],
            "repository_homepage_url": null,
            "repository_download_url": null,
            "api_data_url": null
        }
    ]
}
(venv) Tue Feb 06, 2024 08:13 AM  /home/jmh/dev/nexb/purldb jmh (247-purl-cli-add-urls)
$
johnmhoran commented 9 months ago

@keshav-space Similarly, my nascent urls command (which calls packageurl/contrib/purl2url.py) also supports pkg:rubygems/ but not pkg:gems/. (The error message is still under development as you can see from the output.)

(venv) Tue Feb 06, 2024 08:18 AM  /home/jmh/dev/nexb/purldb jmh (247-purl-cli-add-urls)
$ python -m purldb_toolkit.purlcli urls --purl pkg:gem/bundler-sass --purl pkg:rubygems/bundler-sass --output -

From construct_headers():
valid_but_not_supported
From get_urls_details():
valid_but_not_supported
{
    "headers": [
        {
            "tool_name": "purlcli",
            "tool_version": "0.0.1",
            "options": {
                "command": "urls",
                "--purl": [
                    "pkg:gem/bundler-sass",
                    "pkg:rubygems/bundler-sass"
                ],
                "--file": null,
                "--output": "<stdout>"
            },
            "purls": [
                "pkg:gem/bundler-sass",
                "pkg:rubygems/bundler-sass"
            ],
            "errors": [],
            "warnings": [
                "valid_but_not_supported"
            ]
        }
    ],
    "packages": [
        {
            "purl": "pkg:rubygems/bundler-sass",
            "download_url": {
                "url": null
            },
            "inferred_urls": [
                {
                    "url": "https://rubygems.org/gems/bundler-sass"
                }
            ],
            "repo_download_url": {
                "url": null
            },
            "repo_download_url_by_package_type": {
                "url": null
            },
            "repo_url": {
                "url": "https://rubygems.org/gems/bundler-sass"
            },
            "url": {
                "url": "https://rubygems.org/gems/bundler-sass"
            }
        }
    ]
}
(venv) Tue Feb 06, 2024 08:18 AM  /home/jmh/dev/nexb/purldb jmh (247-purl-cli-add-urls)
$
keshav-space commented 9 months ago

This is wierd shouldn't we go by spec https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#gem ?

johnmhoran commented 9 months ago

Yes, exactly my point (and the source of my confusion re gem vs rubygems as types ;-).

johnmhoran commented 9 months ago

I have a correction re the urls command, which calls packageurl-python/src/packageurl/contrib/purl2url.py -- this appears to support both pkg:gem/ and pkg:rubygems/, using rubygems.org URLs in its output.

If I understand the code correctly, purl2url.py has two primary sections that govern the generation of two categories of URLs:

Each of these sections uses as a decorator argument/parameter a definition of the particular package type(s) it covers, e.g., @repo_router.route("pkg:cargo/.*")

And each section includes both gem and rubygems in one of its decorators: @repo_router.route("pkg:(gem|rubygems)/.*") and @download_router.route("pkg:(gem|rubygems)/.*").

See https://github.com/package-url/packageurl-python/blob/main/src/packageurl/contrib/purl2url.py#L173-L186 and https://github.com/package-url/packageurl-python/blob/main/src/packageurl/contrib/purl2url.py#L294-L305

johnmhoran commented 9 months ago

@JonoYang It looks like the validate endpoint check_existence step includes a version if included (@) in the input PURL as part of its validation process, but that the check_existence step does not handle PURL qualifiers (?) or subpaths (#) and instead just ignores them. Is that an accurate statement?

I think it's accurate and on that basis plan to strip identifiable qualifers and subpaths from incoming PURLs before processing, to note that in the warnings, and to note that in the help section. (I'm including the check_existence step as a default in our validate CLI command -- no additional option/flag to invoke.)

For illustration, in a recent CLI test the validate endpoint concluded that

(venv) Wed Feb 07, 2024 05:36 PM  /home/jmh/dev/nexb/purldb jmh (247-purl-cli-add-urls)
$ python -m purldb_toolkit.purlcli validate --purl pkg:pypi/dejacode --purl pkg:pypi/dejacode@5.0.0 --purl pkg:pypi/dejacode@5.0.0?os=windows --purl pkg:pypi/dejacode@5.0.0os=windows --purl pkg:pypi/dejacode@5.0.0?how_is_the_weather=rainy --purl pkg:pypi/dejacode@5.0.0#how/are/you --purl pkg:pypi/dejacode@10.0.0  --purl pkg:cargo/rand@0.7.2 --purl pkg:nginx/nginx@0.8.9?os=windows --output -
[
    {
        "valid": true,
        "exists": true,
        "message": "The provided Package URL is valid, and the package exists in the upstream repo.",
        "purl": "pkg:pypi/dejacode"
    },
    {
        "valid": true,
        "exists": true,
        "message": "The provided Package URL is valid, and the package exists in the upstream repo.",
        "purl": "pkg:pypi/dejacode@5.0.0"
    },
    {
        "valid": true,
        "exists": true,
        "message": "The provided Package URL is valid, and the package exists in the upstream repo.",
        "purl": "pkg:pypi/dejacode@5.0.0?os=windows"
    },
    {
        "valid": true,
        "exists": false,
        "message": "The provided PackageURL is valid, but does not exist in the upstream repo.",
        "purl": "pkg:pypi/dejacode@5.0.0os=windows"
    },
    {
        "valid": true,
        "exists": true,
        "message": "The provided Package URL is valid, and the package exists in the upstream repo.",
        "purl": "pkg:pypi/dejacode@5.0.0?how_is_the_weather=rainy"
    },
    {
        "valid": true,
        "exists": true,
        "message": "The provided Package URL is valid, and the package exists in the upstream repo.",
        "purl": "pkg:pypi/dejacode@5.0.0#how/are/you"
    },
    {
        "valid": true,
        "exists": false,
        "message": "The provided PackageURL is valid, but does not exist in the upstream repo.",
        "purl": "pkg:pypi/dejacode@10.0.0"
    },
    {
        "valid": true,
        "exists": true,
        "message": "The provided Package URL is valid, and the package exists in the upstream repo.",
        "purl": "pkg:cargo/rand@0.7.2"
    },
    {
        "valid": true,
        "exists": null,
        "message": "The provided PackageURL is valid, but `check_existence` is not supported for this package type.",
        "purl": "pkg:nginx/nginx@0.8.9?os=windows"
    }
]
(venv) Wed Feb 07, 2024 05:48 PM  /home/jmh/dev/nexb/purldb jmh (247-purl-cli-add-urls)
$
JonoYang commented 9 months ago

@johnmhoran

It looks like the validate endpoint check_existence step includes a version if included (@) in the input PURL as part of its validation process, but that the check_existence step does not handle PURL qualifiers (?) or subpaths (#) and instead just ignores them. Is that an accurate statement?

Yes. It looks like we use the type, namespace, and name of the package, then collect all the available versions and then check to see if the version specified in the purl is in the list of available versions. (https://github.com/nexB/purldb/blob/main/packagedb/api.py#L810)

I think it's accurate and on that basis plan to strip identifiable qualifers and subpaths from incoming PURLs before processing,

I think that's fair for now. Thinking a bit, it would be good for us to eventually handle qualifiers in the validate endpoint, especially in the case of maven purls, where it would be useful to see which specific artifacts are available on maven.

johnmhoran commented 9 months ago

Thanks @JonoYang. And I agree re eventually handling qualifiers.

johnmhoran commented 9 months ago

@JonoYang This can wait till next week but in case you have an immediate thought: I'm working on the meta command and want to remove all version, qualifier and subpath data because it's not needed (the underlying function ignores it). In the headers section, I currently keep all such original input PURLs for the record, but in the packages section, the code processes only the "stripped" PURLs.

Currently, for each input PURL I handle it on its own, so a command like this

python -m purldb_toolkit.purlcli meta --purl pkg:pypi/fetchcode@5.0.0 --purl pkg:pypi/dejacode@5.0.0 --purl pkg:pypi/dejacode@5.0.0?os=windows --purl pkg:pypi/dejacode@5.0.0os=windows --purl pkg:pypi/dejacode@5.0.0?how_is_the_weather=rainy --purl pkg:pypi/dejacode@5.0.0#how/are/you --purl pkg:pypi/dejacode?os=windows --purl pkg:cargo/banquo#some/short/path --purl pkg:nginx/nginx@0.8.9?os=windows --output -

will have 6 sets of dejacode output meta data because there are 6 dejacode inputs with various substrings that will get removed. An alternative is to keep track of the "stripped" PURLs and when we have a repeat, all but the first will not be processed and thus these 6 dejacode inputs will result in output data as if there were just 1 dejacode input.

The latter approach seems cleaner, though it uses some memory. With that approach, I would summarize this process in the --help, and add a single warning in headers (and not 1 for each "stripped" PURL) that this action has taken place without any reference to which input PURLs were affected.

When you have the chance, please let me know what you think.

johnmhoran commented 9 months ago

@JonoYang @pombredanne Here's an example of the headers section of the meta output when we strip the @, ? and # separators (and the characters that follow) from the input PURLs.

{
    "headers": [
        [
            {
                "tool_name": "purlcli",
                "tool_version": "0.0.1",
                "options": {
                    "command": "meta",
                    "--purl": [
                        "pkg:pypi/fetchcode@5.0.0",
                        "pkg:pypi/dejacode@5.0.0",
                        "pkg:pypi/dejacode@5.0.0?os=windows",
                        "pkg:pypi/dejacode@5.0.0os=windows",
                        "pkg:pypi/dejacode@5.0.0?how_is_the_weather=rainy",
                        "pkg:pypi/dejacode@5.0.0#how/are/you",
                        "pkg:pypi/dejacode?os=windows",
                        "pkg:cargo/banquo#some/short/path",
                        "pkg:nginx/nginx@0.8.9?os=windows"
                    ],
                    "--file": null,
                    "--output": "<stdout>"
                },
                "purls": [
                    "pkg:pypi/fetchcode@5.0.0",
                    "pkg:pypi/dejacode@5.0.0",
                    "pkg:pypi/dejacode@5.0.0?os=windows",
                    "pkg:pypi/dejacode@5.0.0os=windows",
                    "pkg:pypi/dejacode@5.0.0?how_is_the_weather=rainy",
                    "pkg:pypi/dejacode@5.0.0#how/are/you",
                    "pkg:pypi/dejacode?os=windows",
                    "pkg:cargo/banquo#some/short/path",
                    "pkg:nginx/nginx@0.8.9?os=windows"
                ],
                "processed_purls": [
                    "pkg:pypi/fetchcode",
                    "pkg:pypi/dejacode",
                    "pkg:cargo/banquo",
                    "pkg:nginx/nginx"
                ],
                "errors": [],
                "warnings": [
                    "One or more input PURLs have been stripped to enable proper processing.  The final set of processed PURLs is listed in the 'processed_purls' field above.",
                    "The provided PackageURL 'pkg:nginx/nginx' is valid, but `meta` does not support this package type."
                ]
            }
        ]
    ],
    "packages": [
        {
            "purl": "pkg:pypi/fetchcode",
            "type": "pypi",
            "namespace": null,
            "name": "fetchcode",

. . .
pombredanne commented 9 months ago

@johnmhoran re:

Here's an example of the headers section of the meta output when we strip the @, ? and # separators (and the characters that follow) from the input PURLs.

Some questions:

  1. I assume you mean stripping not in the literal sense but that you parse the PURL instead using the library code. If not you should use this

  2. This is not stripping in all case, but normalizing

  3. why would you ever remove the version? If I asked for one, do not remove this.

  4. I should have the option not to have any such PURL normalization done. I am wondering if this normalization SHOULD not be done by default, as this is surprising

  5. If you issue warning, it would be best to return shorter, concise self-standing messages.

    • In "The final set of processed PURLs is listed in the 'processed_purls' field above." the "field above" has no practical meaning as there is no such concept of fields and above in a JSON document. Instead just return one warning for each normalized PURL. For instance:

    • input PURL: "pkg:pypi/dejacode@5.0.0?os=windows" normalized to "pkg:pypi/dejacode@5.0.0"

    • input PURL: "pkg:cargo/banquo#some/short/path" normalized to "pkg:cargo/banquo"

    • The provided PackageURL 'pkg:nginx/nginx' is valid, butmetadoes not support this package type. say may be instead a shorter, telegraphic style: 'pkg:nginx/nginx' not supported with "meta" command

  6. do not deduplicate by default, the input may be weird, but that's not for this tool to fix. Instead add a --unique option to only return unique PURLs. Espcially the default use of normalization and deduping feels unnatural and surprising. We want no surprise :)

  7. meta should be renamed metadata IMHO as this may be more obvious. Or may be details?

johnmhoran commented 9 months ago

Thanks @pombredanne . Rather than replying to your replies, it's best for us to find a time when I can demonstrate what the output looks like without trying to strip or normalize or whatever term you prefer. We have not yet done that and that's not ideal. Meanwhile, on Monday I'll back out the efforts I made trying to bring some sanity to the meta output. (The other 3 commands have similar issues,)

We're relying on the operation of the underlying tools/functions -- meta for example uses the info() function in fetchcode/package.py. If a user submits the various dejacode-based examples in my sample command above, the output is the same for all of them, except the input string of the PURL plus whatever else follows appears in the output naming. info() does not pay any attention to version data or qualifiers or subpath in the incoming PURL. If that's what you want, we'll do that, but I doubt you'll be happy with it when you see it -- without my changes, the command above outputs 6 copies of identical info() data re the dejcode PURL except that each adopts the PURL+trailing string from the input.

BTW, I use meta because that is the term you suggested in your earlier comment. Lots of moving targets in what passes for our "design" documentation.

johnmhoran commented 9 months ago

@pombredanne Just reread your comments -- when I resume tomorrow, I'll restructure metadata (fka meta) to provide the output as it existed pre-normalization and convert the normalization code to a --unique option as you suggested.

Re fields, IBM and Oracle don't share your view that there is no such concept in a JSON document. ;-)

pombredanne commented 9 months ago

re:

Re fields, IBM and Oracle don't share your view that there is no such concept in a JSON document. ;-)

My point was mostly about using "above" to reference anything in a warning or a message that is elsewhere. These messages should be self contained/standing and any reference to something else should be using it identifier (like an actual PURL)

Using "field" as a name is perfectly fine!

As for version, it should be honored. If the version is ignored in fetchcode, this is a missing feature or a bug

Looking at https://github.com/nexB/fetchcode/blob/master/src/fetchcode/package.py I see some things that raise some questions....

Why do we yield either one package without version in https://github.com/nexB/fetchcode/blob/d0a3fa9bb56dc3a77f7d3d7bd5b8d0e40c7a8612/src/fetchcode/package.py#L132 or possibly yield the same versions of a package twice?

Why do have duplicated code in https://github.com/nexB/fetchcode/blob/d0a3fa9bb56dc3a77f7d3d7bd5b8d0e40c7a8612/src/fetchcode/package.py#L112 and https://github.com/nexB/fetchcode/blob/d0a3fa9bb56dc3a77f7d3d7bd5b8d0e40c7a8612/src/fetchcode/package_versions.py ?

@keshav-space @TG1999 ping :)

keshav-space commented 9 months ago

Looking at https://github.com/nexB/fetchcode/blob/master/src/fetchcode/package.py I see some things that raise some questions....

Why do we yield either one package without version in https://github.com/nexB/fetchcode/blob/d0a3fa9bb56dc3a77f7d3d7bd5b8d0e40c7a8612/src/fetchcode/package.py#L132 or possibly yield the same versions of a package twice?

Why do have duplicated code in https://github.com/nexB/fetchcode/blob/d0a3fa9bb56dc3a77f7d3d7bd5b8d0e40c7a8612/src/fetchcode/package.py#L112 and https://github.com/nexB/fetchcode/blob/d0a3fa9bb56dc3a77f7d3d7bd5b8d0e40c7a8612/src/fetchcode/package_versions.py ?

Few observations:

Yes @pombredanne, since we're yielding metadata for all the versions of the package in package.info we should reuse cargo, npm and pypi code in package_versions.versions. https://github.com/nexB/fetchcode/issues/101

pombredanne commented 9 months ago

Thanks @keshav-space so this is likely material to draft issues @ fetchcode?

keshav-space commented 9 months ago

Thanks @keshav-space so this is likely material to draft issues @ fetchcode?

Yes, I will create issue for these on fetchcode side.

johnmhoran commented 9 months ago

I've just pushed an update adding a --unique option to the metadata (formerly meta) command. Default is to not normalize; --unique will result in normalization. metadata tests have been updated as well.

This addresses all of @pombredanne 's very helpful recent comments, though I have a few equally helpful comments from @JonoYang to address and will do so promptly. Next will be updating the urls command (including adding a test suite), to be followed by doing the same for the validate and versions commands and then creating several additional commands awaiting my attention.

johnmhoran commented 9 months ago

@pombredanne @JonoYang I'm back on the urls command now. About to add the --unique command as I did with metadata, but urls has different issues, e.g., as I think I've reported before, it returns data for versions and other separators that do not actually exist:

This is default behavior atm; I expect --unique would fix that by normalizing (removing all separators and their strings) and deduping the resulting PURLs. But do we want to permit these sorts of default examples without any further vetting or warning?

johnmhoran commented 9 months ago

In metadata, I handle non-existent versions like this:

            "warnings": [
                "'pkg:pypi/fetchcode@5.0.0' could not be fetched",
                "'pkg:pypi/dejacode@5.0.0os=windows' could not be fetched",

Non-existent qualifiers and subpaths are not handled (atm) by metadata or the underlying info() function in fetchcode/package.py.

johnmhoran commented 9 months ago

As I've noted in the past, the output data for the 4 current commands has all sorts of oddities that I don't think we'd want to permit but currently do.

johnmhoran commented 9 months ago

BTW, I plan to handle non-existent versions in urls as I do in metadata (an existing check_existence() function that uses the validate endpoint).

pombredanne commented 9 months ago

As I've noted in the past, the output data for the 4 current commands has all sorts of oddities that I don't think we'd want to permit but currently do.

Can you be specific?

Note that the important thing is to handle correctly the main, common use cases. Oddities for corner cases are OK.

Please focus first on the common case: a plain PURL with a version or without. The cases of qualifiers and subpaths are oddities and should be tended to later (or never)

johnmhoran commented 9 months ago

Thanks @pombredanne . Here's an example of the current approach using metadata (which calls fetchcode package.py's info()) as an example. Say we have these input PURLs:

purls = [ "pkg:pypi/dejacode", "pkg:pypi/dejacode@5.0.0", "pkg:pypi/dejacode@5.0.0?os=windows", "pkg:pypi/dejacode@10.0.0", "pkg:gem/bundler-sass", "pypi/dejacode", ]

  1. We know for a fact that:

"pkg:pypi/dejacode" is supported by info() (and thus metadata) and exists in pypi.org

"pkg:pypi/dejacode@5.0.0" is supported by info() (and thus metadata) and exists in pypi.org and atm is the only version in pypi.org

"pkg:pypi/dejacode@5.0.0?os=windows" does not exist in pypi.org

"pkg:pypi/dejacode@10.0.0" does not exist in pypi.org

"pkg:gem/bundler-sass" is not supported by info() (although rubygems is)

"pypi/dejacode" is not a valid PURL

  1. If we rely solely on the data returned by info():

"pkg:pypi/dejacode" returns 2 OrderedDict objects: one with no version and some URL data (a sort of "generic" URL report) and a second for the sole actual version, 5.0.0, with whatever URL data we gather/generate, e.g., a download_url.

"pkg:pypi/dejacode@5.0.0" returns exactly the same as above except the version value for the first of the two objects is shown as '5.0.0'.

"pkg:pypi/dejacode@5.0.0?os=windows" returns exactly the same as above except the version value for the first of the two objects is shown as '5.0.0' and qualifiers is shown as {'os': 'windows'}.

"pkg:pypi/dejacode@10.0.0" returns exactly the same as above except the version value for the first of the two objects is shown as '10.0.0'.

"pkg:gem/bundler-sass" is None

"pypi/dejacode" is None

  1. The default is now to not normalize -- which would/could remove version, qualifiers and/or subpath data. BUT the data returned by info() does not accurately and clearly provide available info re the input PURLs.

So, as part of the default behavior, metadata (and urls atm, and maybe others going forward) also queries the validate endpoint, and translates the results to both printed warnings and warnings added to the JSON headers warnings list.

"pkg:pypi/dejacode" preserves the info() return.

"pkg:pypi/dejacode@5.0.0" preserves the info() return.

"pkg:pypi/dejacode@5.0.0?os=windows" preserves the info() return.

"pkg:pypi/dejacode@10.0.0" prints and adds the warning "'pkg:pypi/dejacode@10.0.0' could not be fetched" (but this relies on validate and thus should say 'does not exist in the upstream repo')

"pkg:gem/bundler-sass" prints and adds the warning "'pkg:gem/bundler-sass' not supported with metadata command"

"pypi/dejacode" prints and adds the warning "pypi/dejacode' not valid"

  1. The --unique flag will also remove the version, qualifiers and subpath data and dedupe the resulting PURLs.