Closed pombredanne closed 7 months ago
@pombredanne @AyanSinhaMahapatra
I've looked at the SCTK fetch_thirdparty.py example, but I have to admit that I don't understand what a complete command for that utility would look like or how it might apply to the current issue. Examples of how to run the fetch_thirdparty
example would be helpful for me to explore how that works. (I've looked but found no documentation/examples for that utility.)
In addition, the description above of the current issue seems rather vague. What does it mean to create a client API tool to access PURL services? Examples of PURL services we want to handle, and some descriptions of user input and output, would be particularly helpful.
The only exposure I've had so far with the PurlDB is the experimentation I've done since last Friday evening with the new validate
endpoint.
.txt
or .xlxs
file -- the 1+ PURLs that he or she wants to vet with the new validate
endpoint? Maybe with options of terminal output and a .xlsx
workbook?@pombredanne Now that we've (initially) addressed the validate
endpoint with our new CLI, what additional "services" do you want me to focus on, and how can I identify them and begin to understand how users use those services?
@pombredanne As noted last week, I'm blocked for now from additional CLI work until we can add the missing details to your initial description of this issue, i.e., ID the additional services, commands and use cases we want to include.
So the next steps after validate (and after adding tests to validate) would be to use the latest and new fetchcode as a library to add two new sub commands:
After this I would like to see these:
scan_package
pipeline, then return the scan results. Either wait for the scan to complete or poll until completion. Later implement the same with PurlDB which has code in the "priority queue" to handle this. @pombredanne Re the first bullet above -- a versions
subcommand based on fetchcode -- are you looking for this sort of output, or perhaps just a list of versions as strings? (This is an excerpt from the pkg:pypi/scancode-toolkit
output.)
purl_versions = [
[
PackageVersion(
value="2.0.0",
release_date=datetime.datetime(
2017, 6, 23, 8, 35, 20, 322426, tzinfo=tzutc()
),
),
PackageVersion(
value="2.0.0rc3",
release_date=datetime.datetime(
2017, 6, 16, 16, 24, 2, 443222, tzinfo=tzutc()
),
),
. . .
Compare just the versions as strings, e.g.,
results_values = ['2.0.0', '2.0.0rc3', '2.0.1', '2.1.0', . . . '32.0.5rc3', '32.0.6', '32.0.7', '32.0.8']
@pombredanne Do we want to get version data for both one PURL and for multiple PURLs, depending on the user's need? (Just as we do with validating either a single PURL or a list of PURLs.)
Also: What should the output look like: a list of string versions, or JSON (and if so, what would it look like)?
Always start with a single PURL. Expanding to a list is easy.
The output could be either:
{"purl": "... input purl", "versions": ["1.1", "2.3" , ....]}
{"purl": "... input purl", "versions": [{"purl": "pkg:...@1.1.2", "version": "1.1.2"}, {"purl": "pkg:...@1.1.3", "version": "1.1.3"}]}
This will account for multiple PURLs in both cases.
Eventually the output will need account for the input in some header instead, much like in a ScanCode scan, but this is for the future, but nothing urgent for now.
Manage objects internally, and deal with simple/plain serialized Python data at the end only. Adding the release date of each version works too BTW, just make sure you use an ISO timestamp like it is done in our other APIs.
Thanks @pombredanne . 👍
I'm working on the versions
command (see above comments).
Some queries for a PURL that does not exist in the relevant repo (e.g., pkg:pypi/ogdendunes
) return an error message like Error while fetching 'https://pypi.org/pypi/ogdendunes/json': 404
and result in an empty list (from my code). That error message seems to be generated by this fetchcode package_versions.py function.
Other queries for a PURL with a similar but no exact name match (e.g., pkg:pypi/foobar
-- yes, there are a number of PyPI packages with foobar
in the name) result in an empty list (from my code) but no error message from fetchcode.
My CLI code detects the empty list and displays a message in the terminal (There was an error with your '{purl}' query. Make sure that '{purl}' actually exists in the relevant repository.
) -- but I'd like to prevent the fetchcode 404 error message from also being displayed in the terminal as is currently the case.
Is there some way to do this?
More info:
The fetchcode error is displayed in the terminal each time one of these two variables is defined in the code (they produce an empty list):
results = list(versions(purl))
results = list(router.process(purl))
These, otoh, do not invoke a fetchcode error displayed in the terminal, and each produces a generator object.
test01 = versions(purl)
test02 = router.process(purl)
Actually, I should be able to use 'validate' and display a message to the user for each PURL for which 'validate' returns "exists": false
....
@johnmhoran I would not worry too much about the CLI output for now, as long as the JSON is correct If fetchcode displays an error message, then that's an issue there not here ... @TG1999 @keshav-space
@pombredanne so shall we remove https://github.com/nexB/fetchcode/blob/d0a3fa9bb56dc3a77f7d3d7bd5b8d0e40c7a8612/src/fetchcode/package_versions.py#L523 the logger and raise errors instead?
@pombredanne @JonoYang I'm close to being ready to commit and push my latest purlcli.py
and test_purlcli.py
. All 42 tests pass (3 test classes, 1 for each current command/service, e.g., class TestPURLCLI_validate(object)
, and each is parametrized, thus my use of object
as argument per my research -- TestCase
and FileBasedTesting
seem to be incompatible with @pytest.mark.parametrize()
).
I ran make test
, expecting just 1 failure as in the past, but this time, 2 failed.
FAILED minecode/tests/test_maven.py::MavenEnd2EndTest::test_visit_and_map_with_index - AssertionError: Lists differ: [{'ur[31 chars]ven2/cnuernber/dtype-next/0.4.2/dtype-next-0.4[49087 chars]one}] != [{'ur[31 chars]ven2/.index/nexus-maven-repository-index.532.g[49087 chars]one}]
FAILED minecode/tests/test_ls.py::ParseDirectoryListingTest::test_parse_listing_from_lslr - AssertionError: Lists differ: [{'pa[1527 chars] '2023-01', 'target': None}, {'path': 'dists/e[974 chars]one}] != [{'pa[1527 chars] '2024-01', 'target': None}, {'path': 'dists/e[974 chars]one}]
No idea why, no reason to think this results from my work, but who knows? test_visit_and_map_with_index
has failed with make test
since I first cloned the repo. test_parse_listing_from_lslr
is a new failure.
Unless you suggest otherwise, I'm going to vet my code and tests for a final cleanup, commit and push. ;-)
Just committed and pushed.
@johnmhoran I wouldn't mind the test_parse_listing_from_lslr
for now. This test fails every so often due to changes in file dates when the test is run. I will make a PR to revisit this test or remove it.
Great -- thank you @JonoYang . 👍
No need to extend object
with your class. This is the default.
Thanks @pombredanne -- I wondered about that. I got the idea from /nexb/purldb/etc/scripts/test_utils_pip_compatibility_tags.py
.
@johnmhoran
test_utils_pip_compatibility_tags.py
This https://github.com/nexB/purldb/blob/main/etc/scripts/test_utils_pip_compatibility_tags.py is old code from old pip that was designed originally for Python 2.6.... in general the etc/script code (or code that is vendored like this https://github.com/nexB/purldb/blob/main/etc/scripts/test_utils_pip_compatibility_tags.py#L3 ) may not be the best example to follow.
Thanks @pombredanne . That was not evident, and there were only a few parametrize examples in purldb. The other example you gave me did not use test classes, which imho are needed to allow the tests for a particular command/service to be run on their own if the user wishes.
I have a few questions re @pombredanne’s description of the next command/service I'm adding -- urls
. Looking at part of the description above (https://github.com/nexB/purldb/issues/247#issuecomment-1875899523),
urls: given a PURL, return a list of [{URL type: URL}, ...] as in [{"homepage_url": "https:example.com"}, {"vcs_url": "...."}] and various download URLs. Use the packageurl library for this (purl2url) and this will need updating as needed, and use as well scancode-toolkit packagedcode or code in dejacode.
2 questions:
scancode-toolkit packagedcode
and the code in dejacode
?need updating
mean that I'll adapt the SCTK/DJC code to this PURL CLI by updating purl2url
in the packageurl-python
repo? If so, just to be clear, that would mean each set of purl2url
updates would need to be committed, pushed, and the PR opened, finished and merged before I could then use that update in the PURL CLI tool. Is that correct?I've already included the URLs currently handled by purl2url
(though I think I need to change the urls
value to a list of dictionaries rather than a single dict). An example:
[
{
"purl": "pkg:rubygems/bundler@2.3.23",
"urls": {
"repo_url": "https://rubygems.org/gems/bundler/versions/2.3.23",
"download_url": "https://rubygems.org/downloads/bundler-2.3.23.gem",
"inferred_urls": [
"https://rubygems.org/gems/bundler/versions/2.3.23",
"https://rubygems.org/downloads/bundler-2.3.23.gem"
],
"repo_download_url": null,
"repo_download_url_by_package_type": null,
"url": "https://rubygems.org/gems/bundler/versions/2.3.23"
}
}
]
The description also refers to using scancode-toolkit packagedcode or code in dejacode
and includes as an example {"vcs_url": "...."}
, which I take as a reference to a scan output, e.g.,
"homepage_url": null,
"download_url": null,
"size": null,
"sha1": null,
"md5": null,
"sha256": null,
"sha512": null,
"bug_tracking_url": null,
"code_view_url": null,
"vcs_url": null,
"urls"
is now an alphabetized list of the initial set of purl2url
URLs (indent reduced from 4 to 2 -- is there a preference/best practice?).
[
{
"purl": "pkg:rubygems/bundler@2.3.23",
"urls": [
{
"download_url": "https://rubygems.org/downloads/bundler-2.3.23.gem"
},
{
"inferred_urls": [
"https://rubygems.org/gems/bundler/versions/2.3.23",
"https://rubygems.org/downloads/bundler-2.3.23.gem"
]
},
{
"repo_download_url": null
},
{
"repo_download_url_by_package_type": null
},
{
"repo_url": "https://rubygems.org/gems/bundler/versions/2.3.23"
},
{
"url": "https://rubygems.org/gems/bundler/versions/2.3.23"
}
]
}
]
Re using scancode-toolkit packagedcode, this seems to be an example of a utils.py
function I could call from the new urls
Click command code itself to retrieve the vcs_url
: def normalize_vcs_url(repo_url, vcs_tool=None).
Looking at a few of the other numerous SCTK output URLs, they seem to be distributed across various files, often handling different types in separate files. Hopefully there's a quasi-centralized way to ID the code for all relevant URLs, and maybe call all as needed from the urls
Click functions? And maybe the relevant DejaCode code is also readily accessible? ;-)
@pombredanne @JonoYang When you have time, please take a look at my questions from last Friday evening (and several comments that follow) re next steps with the urls
command.
Meanwhile, since I'm not sure how/where to get the ScanCode-Toolkit or DejaCode URL-related code referred to in the urls
command description, I've taken another look at our nascent purlcli meta
command, which I see has a long list of various URLs and which I can access from within the new urls
command.
One observation: the results from the meta
command, which calls fetchcode.package.info(),
(1) seem to always include a full list of dictionary objects, one for each version (whether or not the PURL passed to the command includes a version) and
(2) if the PURL passed to the command includes a version, the list will contain two dictionaries with the nested purl
(inside the metadata
field) equal to the queried PURL -- BUT the first of these will have "download_url": null
while the second will have an actual download_url
when available. For example, a meta
command query for "pkg:pypi/scancode-toolkit@2.0.0"
will include "download_url": "https://files.pythonhosted.org/packages/41/31/ec6c58f3fa60181803265410b4ddb3abae1214c946e36969fa0ce9fab014/scancode_toolkit-2.0.0-py2-none-any.whl",
in the second but not first matching dictionary.
For use in the urls
command it looks like I need to use the PURL w/o any version info so that when the urls
command is run on a PURL with a version, I can find the correct returned dictionary that has an actual download_url
(when available) and not merely a null
value. Splitting on @
seems to do the trick.
Here's an example with meta
run on pkg:pypi/scancode-toolkit@2.0.0
-- these are the first 2 dictionaries, each with purl
(nested inside metadata
) set to the PURL and version, but only the second with a download_url
and an actual value rather than null
:
(venv) Mon Jan 22, 2024 01:40 PM /home/jmh/dev/nexb/purldb jmh (247-purl-cli-add-urls)
$ python -m purldb_toolkit.purlcli meta --purl pkg:pypi/scancode-toolkit@2.0.0 --output -
[
{
"purl": "pkg:pypi/scancode-toolkit@2.0.0",
"metadata": [
{
"type": "pypi",
"namespace": null,
"name": "scancode-toolkit",
"version": "2.0.0",
"qualifiers": {},
"subpath": null,
"primary_language": null,
"description": null,
"release_date": null,
"parties": [],
"keywords": [],
"homepage_url": "https://github.com/nexB/scancode-toolkit",
"download_url": null,
"api_url": "https://pypi.org/pypi/scancode-toolkit/json",
"size": null,
"sha1": null,
"md5": null,
"sha256": null,
"sha512": null,
"bug_tracking_url": null,
"code_view_url": null,
"vcs_url": null,
"copyright": null,
"license_expression": null,
"declared_license": "Apache-2.0 AND CC-BY-4.0 AND LicenseRef-scancode-other-permissive AND LicenseRef-scancode-other-copyleft",
"notice_text": null,
"root_path": null,
"dependencies": [],
"contains_source_code": null,
"source_packages": [],
"purl": "pkg:pypi/scancode-toolkit@2.0.0",
"repository_homepage_url": null,
"repository_download_url": null,
"api_data_url": null
},
{
"type": "pypi",
"namespace": null,
"name": "scancode-toolkit",
"version": "2.0.0",
"qualifiers": {},
"subpath": null,
"primary_language": null,
"description": null,
"release_date": null,
"parties": [],
"keywords": [],
"homepage_url": "https://github.com/nexB/scancode-toolkit",
"download_url": "https://files.pythonhosted.org/packages/41/31/ec6c58f3fa60181803265410b4ddb3abae1214c946e36969fa0ce9fab014/scancode_toolkit-2.0.0-py2-none-any.whl",
"api_url": "https://pypi.org/pypi/scancode-toolkit/json",
"size": null,
"sha1": null,
"md5": null,
"sha256": null,
"sha512": null,
"bug_tracking_url": null,
"code_view_url": null,
"vcs_url": null,
"copyright": null,
"license_expression": null,
"declared_license": "Apache-2.0 AND CC-BY-4.0 AND LicenseRef-scancode-other-permissive AND LicenseRef-scancode-other-copyleft",
"notice_text": null,
"root_path": null,
"dependencies": [],
"contains_source_code": null,
"source_packages": [],
"purl": "pkg:pypi/scancode-toolkit@2.0.0",
"repository_homepage_url": null,
"repository_download_url": null,
"api_data_url": null
},
@johnmhoran
where would I find the relevant scancode-toolkit packagedcode and the code in dejacode?
There are functions that generate download urls for packages based on type, namespace, name, version, etc in scancode-toolkit/packagedcode. For example, https://github.com/nexB/scancode-toolkit/blob/develop/src/packagedcode/maven.py#L1135
does the reference to need updating mean that I'll adapt the SCTK/DJC code to this PURL CLI by updating purl2url in the packageurl-python repo?
I'm not sure what @pombredanne intends, but my guess is that we should have new functions that handle more packages in https://github.com/package-url/packageurl-python/blob/main/src/packageurl/contrib/purl2url.py . For example, we do not handle maven purls in purl2url, so something like build_maven_repo_url
and build_maven_download_url
would be needed.
If so, just to be clear, that would mean each set of purl2url updates would need to be committed, pushed, and the PR opened, finished and merged before I could then use that update in the PURL CLI tool. Is that correct?
Yes, though you could do a editable install of your dev purl2url into your purldb-toolkit venv to try out your new functions without creating a release
"urls" is now an alphabetized list of the initial set of purl2url URLs (indent reduced from 4 to 2 -- is there a preference/best practice?).
I don't have a preference, but an indent of 2 might be better for display in a shell
@JonoYang @johnmhoran this makes sense. @johnmhoran an extra step could be to check if the URLs do exist using a "head" request ... we may have example in various places.
re: https://github.com/nexB/purldb/issues/247#issuecomment-1904873463 @johnmhoran why nesting the results under a metadata attribute that also contains the purl? IMHO instead just report a list of mappings directly, and we could move the purl up as the 1st attribute so may be:
[
{
"purl": "pkg:pypi/scancode-toolkit@2.0.0",
"type": "pypi",
"namespace": null,
"name": "scancode-toolkit",
"version": "2.0.0",
"qualifiers": {},
"subpath": null,
"primary_language": null,
....
},
{
"purl": "pkg:pypi/scancode-toolkit@2.0.0",
"type": "pypi",
"namespace": null,
"name": "scancode-toolkit",
"version": "2.0.0",
"qualifiers": {},
"subpath": null,
......
},
Thanks @JonoYang and @pombredanne . Does this mean that for the meta
command it's sufficient to keep using fetchcode.package.info(), with its own download_url
and other URLs, but for the urls
command, I should be exclusively using -- and there beefing up -- purl2url?
Re a head
request, I've begun a little exploring and see, for example, that https://www.nexb.com
returns <Response [301]>
while https://nexb.com
returns <Response [200]>
. The status code definitions are detailed and voluminous -- how does one determine the relationship between the response code and our own response (JSON field/terminal message)?
>>> import requests
>>> x = requests.head('https://www.nexb.com')
>>> print(f'x = {x}')
x = <Response [301]>
>>> print(f'x.headers = {x.headers}')
x.headers = {'Server': 'nginx', 'Date': 'Sat, 20 Jan 2024 04:55:12 GMT', 'Content-Type': 'text/html; charset=UTF-8', 'Connection': 'keep-alive', 'Expires': 'Sat, 20 Jan 2024 05:55:12 GMT', 'Cache-Control': 'max-age=3600, public, max-age=86400', 'X-Redirect-By': 'WordPress', 'Location': 'https://nexb.com/', 'X-Cache-Status': 'MISS', 'Strict-Transport-Security': 'max-age=31536000;', 'X-XSS-Protection': '1; mode=block', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'SAMEORIGIN', 'Referrer-Policy': 'no-referrer-when-downgrade', 'Content-Security-Policy': "default-src * 'unsafe-inline' 'unsafe-eval' data: blob:;"}
>>> z = requests.head('https://nexb.com')
>>> print(f'z = {z}')
z = <Response [200]>
>>> print(f'z.headers = {z.headers}')
z.headers = {'Server': 'nginx', 'Date': 'Sat, 20 Jan 2024 04:58:59 GMT', 'Content-Type': 'text/html; charset=UTF-8', 'Connection': 'keep-alive', 'Vary': 'Accept-Encoding', 'Last-Modified': 'Thu, 18 Jan 2024 20:12:55 GMT', 'X-Cache-Status': 'HIT', 'Strict-Transport-Security': 'max-age=31536000;', 'X-XSS-Protection': '1; mode=block', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'SAMEORIGIN', 'Referrer-Policy': 'no-referrer-when-downgrade', 'Content-Security-Policy': "default-src * 'unsafe-inline' 'unsafe-eval' data: blob:;", 'Cache-Control': 'public, max-age=86400', 'Content-Encoding': 'gzip'}
>>>
@pombredanne Just saw your question re the structure of my meta
command data. Thank you. I'll make the change.
There are currently URL fields in the data returned by the meta
and urls
commands. On which URL fields do we want to run a head
request? And do we want to sync the URL values that appear in the data returned by two or more commands?
@pombredanne @JonoYang A related question -- do we want to rely
meta
returns, and urls
returns?If I understand the accumulating design details (always welcome -- keep them coming!) I should start adding my code in purl2url
, PURL type and URL type by PURL type and URL type?
The next question of course is -- what URL fields? I think we need a comprehensive list of URLs from the meta
output, the SCTK output, and whatever other output/code I can find -- I don't think we have such a list, or of what is reported where. Once we have the list, you and others can decide what URL fields to add to purl2url
.
And -- what list of PURL types, with what priorities, for purl2url
? Happy to make the list myself but might we have one already as part of our ongoing work? This PURL CLI project seems that it could use a bit more organization than we have atm.... ;-)
@johnmhoran
do we want to rely
solely on the fetchcode.package.info() function (and the other package.py functions) for the data meta returns, and solely on the packageurl.contrib.purl2url code for the data urls returns?
Not sure, I would start by keeping them separate for now. @pombredanne do you have a suggestion?
If I understand the accumulating design details (always welcome -- keep them coming!) I should start adding my code in purl2url, PURL type and URL type by PURL type and URL type?
Yes. I think we will want to eventually move the functions that generate repo and download urls from scancode-toolkit/packagedcode
to purl2url
The next question of course is -- what URL fields?
What do you mean by url fields? homepage url, download url, vcs url, etc?
what list of PURL types, with what priorities
I would start by adding support for maven purls to purl2url
, and then see what other package types are missing from purl2url
that we have available at scancode-toolkit/packagedcode
@JonoYang Thank you.
meta
and urls
sources/code separate for nowpurl2url
*_url
fields created and populated in various files in various repos==> as I continue working on a branch in my local purldb
repo, how do I make and access changes to purl2url
? In my purldb branch, I can open that file at /home/jmh/dev/nexb/purldb/venv/lib/python3.8/site-packages/packageurl/contrib/purl2url.py
. Is that where I should do my initial development? How would that be handled in the next PR I create from my current purldb branch?
What I really mean to ask is How can I work on code in the repo holding purl2url at the same time I'm working on a branch in the purldb repo and (with pip install -e .
I presume) access the former from the latter?
@johnmhoran
pip install -e .
in the packageurl-python directoryYou will be able to make changes to the code. I think the code navigation stuff, where you ctrl+click the function names and import statements, will lead you to your local checkout of packageurl-python.
Just remember, if you clean the purldb repo with make clean
, or ./configure --clean
, and then run make dev
, you will need to go through the steps above again.
@JonoYang I haven't seen any clean
references in the purldb readme and no doc'n -- when do I run make clean
or ./configure --clean
, and when in my work do I run make dev
? Re make dev
, I think I'd run whenever I merged an updated main
, but not for changes in my local purldb
repo.
What about changes in this new packageurl-python
repo that I need to clone etc.? Do those trigger the need to run make dev
in purldb? In
packageurl-python`?
@johnmhoran
I run make clean
and make dev
when things break, or if dependencies have changed in the project.
What about changes in this new packageurl-python repo that I need to clone etc.? Do those trigger the need to run make dev in purldb? In packageurl-python`?
You would run make dev
in purldb
if you haven't done so already. Since you already have it set up, you can just follow the instructions above to install packageurl-python in editable mode.
Thank you @JonoYang -- this is very helpful, and I'm happy to say that when I cloned purldb
last month, my steps were
git clone git@github.com:nexB/purldb.git
cd purldb
make dev
make envfile
make postgres
make test
git checkout -b 247-create-purl-cli-tool
so all is good.
Opened a new issue in packageurl-python
-- Add support for additional packages in purl2url #143 .
@JonoYang After cloning packageurl-python
I ran
make dev
make test
git checkout -b 143-add-purl2url-package-support
and was about to activate the virtual environment, but I see no venv
-- although there is a pyvenv.cfg
that contains
home = /usr
implementation = CPython
version_info = 3.8.10.final.0
virtualenv = 20.14.1
include-system-site-packages = false
base-prefix = /usr
base-exec-prefix = /usr
base-executable = /usr/bin/python3
Do you know whether there is a virtual environment and if so how to activate?
@johnmhoran looking at the makefile for packageurl-python (https://github.com/package-url/packageurl-python/blob/main/Makefile#L33), it looks like it installed the virtual env stuff in the root of the project. I think you should be able to activate the venv by doing source bin/activate
.
Thank you @JonoYang -- virtual env activated. 👍
@pombredanne Re your comment why nesting the results under a metadata attribute that also contains the purl? IMHO instead just report a list of mappings directly, and we could move the purl up as the 1st attribute ... :
The meta
command, which uses fetchcode.package.info()
, returns a list of dictionaries, one for each version of the input PURL (if it has no version) plus a preliminary dictionary for a version-less PURL. If the input PURL has a version, same output except the initial dictionary names the input PURL and version but has a different download_url
value (if any). (Don't know why we have this initial dictionary -- maybe meant to be a generic set of metadata for the PURL?)
Thus, if we want the output dict/JSON to identify what the command's input PURL was, we need the 'purl' field where it is now -- if we remove it, we'll just have a list of dictionaries for all versions.
Of course, this might be enough -- that's a design question. So, do we want the output to identify the input PURL (including version if any), or just display the list of metadata dictionaries, version by version?
BTW, versions
currently also has the output identify the command's input PURL -- we probably want both versions
and meta
to either identify or not identify the input PURL, i.e., consistent structure. (validate
already identifies the input PURL in the output structure, consistent with your suggested structure for meta
.)
@johnmhoran On second thoughts you should reuse the same format as scancode toolkit. So here this would be
@pombredanne I'm not clear on what you want this to apply to and what the structure would look like. Is this meant to apply only to the meta
output, and not the output from validate
, versions
, urls
or any of the other commands we'll be adding?
It might be useful to examine the current output from all 4 current commands (urls
is just underway and will involve work on the packageurl-python purl2url.py file concurrently with work on the purldb purlcli.py file). I have a call coming up but afterwards will upload a file here with output from all 4 commands so we can make decisions with the actual data and structure in front of us.
Once I do that, I will mock up the revised meta
output structure to what I think your prior comment proposes and paste that here so we can discuss/OK/change etc.
Last point: you also asked the meta structure to be changed to put the nested purl
at the top -- does that still apply? That structure comes from the fetchcode info() function -- maybe that's where we should change the order, not in purlcli.py?
@pombredanne @JonoYang. Uploading examples of console outputs from the 4 current commands, with and without versions, i.e.,
python -m purldb_toolkit.purlcli validate --purl pkg:pypi/fetchcode --output -
python -m purldb_toolkit.purlcli validate --purl pkg:pypi/fetchcode@0.1.0 --output -
python -m purldb_toolkit.purlcli versions --purl pkg:pypi/fetchcode --output -
python -m purldb_toolkit.purlcli versions --purl pkg:pypi/fetchcode@0.1.0 --output -
python -m purldb_toolkit.purlcli meta --purl pkg:pypi/fetchcode --output -
python -m purldb_toolkit.purlcli meta --purl pkg:pypi/fetchcode@0.1.0 --output -
python -m purldb_toolkit.purlcli urls --purl pkg:pypi/fetchcode --output -
python -m purldb_toolkit.purlcli urls --purl pkg:pypi/fetchcode@0.1.0 --output -
To best support using various PURL-based services, I would like to have a command client tool and library as a client API that can expose these services for integration elsewhere.