Closed jamespjh closed 2 years ago
@jamespjh These requirements would definitely only match a subset of PyPI, Conda, CRAN, Bioconductor etc and I agree we should have a confidence providing process that can actually be achieved as opposed to an assurance process that cannot. However, I would also suggest we run (or rely on someone else's) automated security scan to flag a subset of more obvious / easily detectable security issues.
[Edit: removed Tier 3+ question as answered in comment in PR #304 referencing this issue. I've updated the title to reflect this precess is for Tier 3 and above]
We should task someone to script up my criteria above, and find out what % of packages in PyPI meet it, and in particular, what % of our packages in our required list meet it. Sounds like a fun task.
I might do it on Thursday afternoon if I get time.
@martintoreilly @jamespjh There are really two tasks here; to implement/run the scripts to see what happens; and then to decide how to revise the protocol according to the results. Do both need to be complete for the DSSG?
I think that is somewhat dependent on the results of the automated checks. We will need to take a whitelisting decision on the packages required for the DSSG Tier 3 projects regardless, but the number of packages we need to manually review will depend on the automatic acceptance criteria.
@jamespjh @tomdoel I'm talking to @darenasc about this issue this afternoon. He may have some time to work on it. @jamespjh Let me know if you start working on it so we don't duplicate effort.
@darenasc @tomdoel @jamespjh I've updated the criteria with non-binary measures I think we should capture from our automated crawl of the package repositories to allow us to explore the impact of tweaking the criteria.
@darenasc Will take a first pass at this from tomorrow (Friday) and should have an initial output for Monday. He'll initially target the most popular Python packages and produce:
@darenasc As discusses, please open a draft PR for this and check in your code as you go. We can have any required detailed conversation on the implementation there.
Decision:
Safe Haven Managment
production subscription@jamespjh and I just had a chat about what we do with the output of the whitelist criteria evaluation for packages. Our view is that we should not proactively add all packages on e.g. PyPI that meet the criteria for whitelisting. Instead, we should wait for packages to be requested and use the fact it meets the whitelisting criteria to allow us to default to approve unless we feel a particular package (or pattern of package requests) needs further investigation. This allows us to support a very fast turnaround for uncontentious packages (which should be the vast majority), while being able to reassure data providers that there is a "person in the loop" on all whitelisting decisions. It also lets us evolve our whitelisting criteria as we get a feel for how well they are met by a range of mainstream and more boutique packages.
From @jamespjh via email:
Part of our automated checks for the DSH dependency tree should check if the license is one of those in the OSI approved list — from the Redhat workshop.
@ots22 @nbarlowATI @edaub @myyong @edwardchalstrey1 @jack89roberts Thoughts?
Closing as part of a stale issue cleanup.
The protocol should be quick - ideally even scriptable.
I propose packages should:
These requirements need to be satisfied for each package and their full dependency graph.
Could give other organisations delegated / transitive trust - e.g.
Let's research this further. Initially let's script the above criteria and see what survives from PyPI and CRAN. The script should capture the measures above (as should any parallel manual review).
For the current Tier-3 projects:
TODO: