Open david-a-wheeler opened 4 years ago
Thank you for the effort in writing this and providing recommendations. We've made some improvements to the specification based on some of the concepts you've outlined.
If there are outstanding concerns, can they be decomposed into smaller, more manageable pieces, so that each individual item can be independently tracked and potentially improved upon?
Comments on OWASP “Software Component Verification Standard” by David A. Wheeler
Here are my comments on the “Software Component Verification Standard” Version 1.0.0-RC.1 (Public Preview), 16 April 2020, https://owasp-scvs.gitbook.io/scvs/ My apologies that’s it’s one long document; I wrote my comments & then discovered that comments were wanted via GitHub. Had I known that I would have split this up. But hopefully they’ll be useful anyway.
This document’s frontispiece says it’s a “grouping of controls, separated by domain, which can be used by architects, developers, security, legal, and compliance to define, build, and verify the integrity of their software supply chain.” So I’m commenting on the document based on that understanding. My high-level comments:
Below are specific comments.
=======================
Specific comments:
Title: The title is wrong or at least dreadfully misleading. The title needs to be changed to something that accurately reflects its contents. Currently the title says it’s the “Software Component Verification Standard” - yet this specification doesn’t cover software component verification. Per ISO/IEC 15288 and ISO/IEC 12207 the purpose of verification is to ensure that the system meets its requirements (including regulations and such). Yet nothing in this specification verifies that the component meets its requirements, so by definition this document doesn’t support (general) verification. It doesn’t even verify that a component meets its security requirements (never mind ALL its requirements). In addition, the “component analysis” section is inadequate for a serious security analysis; it doesn’t include many important measures to analyze the security of a component. The frontispiece also makes it clear that the title is wrong; the frontispiece says that this specification focuses on the “integrity of their software supply chain.” Since this document is actually focused on software security supply chain issues, and not software component verification, the title should be changed to reflect its actual purpose instead of its current misleading name. Please change the title to reflect the document’s actual purpose. An example would be “Software Component Supply Chain Integrity Verification Standard”; I’m sure there are many other possible names. Please ensure that the title clearly and accurately reflects its contents to readers who have not yet read it.
In chapter ”Assessment and Certification”:
It says, “The recommended way of verifying compliance of a software supply chain with SCVS is by performing an "open book" review, meaning that the auditors are granted access to key resources such as legal, procurement, build engineers, developers, repositories, documentation, and build environments with source code.” This text - specifically the term “auditors” - seems to presume that the specification will only be used for third-party audits. Yet I expect a major use of this document (if used) will be for organizations to determine, for themselves, if they meet the requirements. The frontispiece says that the document’s users include “architects, developers, security, legal,...” - but there isn’t any discussion on how THESE groups would use this document The use of this by organizations on themselves needs to be discussed somewhere. I recommend that somewhere before this section there be a discussion on how organizations can use this themselves in various roles, and THEN discuss certification, to make it clear that uses other than certification are supported.
In chapter “Using SCVS”:
Nowhere does this document explain V1 through V6, we just get dumped into this list in those sections. What are you calling them (domains? families?)? There should be a hint of that early in the document. I suggest as the first section of “Using SCVS” there be a subsection called “Control families” with text like this: “This specification identifies a set of 6 control families. Each family has an identifier (V1 through V6) and contains a number of specific controls (numbered V1.1, V1.2, and so on). These six families are inventory (V1), software bill of materials (V2), build environment (V3), package management (V4), component analysis (V5), and pedigree and provenance (V6).”
It’s claimed that it’s to “Develop a common taxonomy of activities, controls, and best-practices that can reduce risk in a software supply chain”. There’s no real taxonomy here, at least not in the meaning many would accept. State what you’re actually doing instead, don’t call it a taxonomy.
In chapter “V1 Inventory” Chapters V1 and V2 make a big deal out of distinguishing “inventory” from “software bill of material” (SBOM) yet there seems to be no real difference and it certainly isn’t clearly explained. The term “inventory” is not defined in the glossary. From the text it appears that an SBOM is simply the inventory list. if so, why use two different words for basically the same thing? I’m guessing a difference was intended but the document doesn’t clearly state it. This concern affects everything that uses the term “inventory” and “SBOM”. As a result, I don’t know exactly what many of these requirements actually mean, making it hard to review & hard to apply. Add underlined, as that’s another common way to identify something: “Component identification varies based on the ecosystem the component is part of. Therefore, for all inventory purposes, the use of identifiers such as Package URL or (name of common package manager repository + name of package within it) may be used to standardize and normalize naming conventions for managed dependencies. In the text, “Having organizational inventory of all first-party, third-party, and open source components…” - that doesn’t make sense, OSS components are also first-party or third-party, Perhaps rewrite to “all first-party and third-party components, whether they are proprietary or open source software, …”
Requirement V1.1: “All components and their versions are known at completion of a build” - does this include transitive dependencies? Does it include the underlying operating system & database (which may be tracked by a different system)? How about compile/build tools like compilers and test frameworks? It’s so brief that I have no idea what this requirement means.
V1.2: Says, “Package managers are used to manage all third-party binary components” - what is meant by a package manager? If I have a project-local shell script that re-downloads field from URLs listed in a file, does that count? Shouldn’t there be a separate requirement for third-party source code (not binary) components at higher levels? E.g., JavaScript & Python? I strongly encourage people to use package managers, but it’s not clear that demanding them for all binary components is practical, especially for embedded systems like IoT systems.
V1.3 & V1.4: V1.3 says “Inventory”, V1.4 says “bill of materials”. As noted earlier, I see no serious difference, say “bill of materials” in both places or explain why they’re different somewhere. These need clear definitions.
V1.3: “An accurate inventory of all third-party components is available in a machine-readable format” - again, is this transitive? Most proprietary libraries will NOT allow their recipients to find out what’s transitively included. I think it’s a great goal, but I think a minority of current projects could manage this today; I have concerns about its practicality unless you’re only applying this to green field development.
V1.5: “Software bill-of-materials are required for new procurements” - does this mean GENERATING an SBOM is required, or that software for use must come with an SBOM? If it’s the latter, today that’s often impractical, especially for any proprietary software. Also, once again, this is the “inventory” section but it’s asking about SBOMs… which again leads me to believe there’s no real difference.
V1,10 “Point of origin is known for all components” - what does this mean? I record the https URL? I have the passport numbers for all the authors? I have no idea how to verify this, or really even what it means.
In chapter V2:
V2.1: “A structured, machine readable software bill-of-materials (SBOM) format is present” - this is yet another requirement that needs to be reworded to be clear. You mean that I have to find a random SBOM off the street & put it in my directory? Is this the SBOM for the software I’m developing? Are these SBOMs for the software I’m ingesting? How deep do they need to go - can it be just direct dependencies? V2.3: “Each SBOM has a unique identifier” - why? I presume what’s meant is that there be a way in the SBOM to indicate exactly what version(s) of software it applies to. Otherwise, every time I regenerate an SBOM I would have to create a unique ID, which would make reproducible builds impossible (and that would be terrible). V2.4: “SBOM has been signed by publisher, supplier, or certifying authority.” - Absolutely not. An SBOM should be signed this way when its corresponding software is RELEASED, but the requirement doesn’t clearly say that. The same problem happens with many of the rest of the requirements, there’s a failure to indicate WHEN things need to happen with the SBOM. 2.9 “SBOM contains a complete and accurate inventory of all components the SBOM describes” - proprietary vendors will not permit that; at best they’ll let you refer to direct dependencies, not indirect ones. This whole document needs to clarify direct vs. transitive dependencies, when are which required?
In chapter V3:
V3.1: “Application uses a repeatable build” - I presume this simply means that if you repeat a build with unchanged inputs, you get the same (bit-for-bit) result, and that makes sense for level 1. That needs to be clarified, because that’s not the same as a reproducible build. However, at higher levels (say level 3) I would expect another additional criterion that specifically require reproducible builds (not just repeatable builds): “Application uses an independently-verified reproducible build” - that is, someone else can take the inputs and produce the same (bit-for-bit) result.
3.1 - does this include ensuring that machine learning results are repeatable given the same data sets? Often training results are considered “data” - yet that data affects execution. It probably should include it.
3.2 - “Documentation exists on how the application is built and instructions for repeating the build”. I understand you want to make automation optional, but this goes too far. If you have to follow instructions, instead of initiating a build command, you are basically guaranteeing disaster over time. People almost never read instructions, at best they copy & paste a command. Change this to something like: “Documentation exists on how the application is built and there are instructions on how to re-execute the automated system for (re)building the system”.
3.4 “Application build pipeline prohibits alteration of build outside of the job performing the build” - this is very unclear. What is meant by this? This needs clarification. Same for most of the rest of this section.
3.10 Application build pipeline enforces authentication and defaults to deny & 3.11 Application build pipeline enforces authorization and defaults to deny. - What is meant here? By what? Most of the requirements in 3 presume that the application build pipeline survives a build. Yet in many cases they’re in containers or temporary VMs, so these questions make no sense.
3.15 “Application build pipeline has required maintenance cadence where the entire stack is updated, patched, and re-certified for use” - what is a “required maintenance cadence”? Why do I want one? How would I know when I have it? This is way too vague.
3.17 “All build-time manipulations to source or binaries are known and well defined”. Clarify that this should include compiler optimizations from past executions (e.g., branch-probabilities and JIT warm-ups).
3.18 “Checksums of all first-party and third-party components are documented for every build” - this should be recorded - not documented (nobody looks at the documentation) & automatically checked later. If this checking is not automated it will generally not happen. It says “checksums”; I think that should be “cryptographic hashes” at least, since a “checksum” need not be a cryptographic hash.
3.19 - again, “checksum” should be “cryptographic hash”.
3.21 - “Unused direct and transitive components have been removed from the application”. Agree for direct components, but for transitive components this needs more nuance. Often it’s hard to tell if something is “unused.” In addition, there are risks to removing components if it’s indirect that can cause other failures.
In chapter V4:
The term “package manager” is unclear. The glossary definition (“Package manager - A distribution mechanism that makes software artifacts discoverable by requesters.”) doesn’t really help. As defined, the world wide web and Google are a package manager.
4.1 “Binary components are retrieved from a package repository” - I applaud the goal, but this is probably unrealistic unless “World Wide Web” counts as a package repository. If a proprietary vendor sells a binary component, they often won’t use traditional package manager interfaces; what is expected here? Not all OSS is in a package manager’s repo. Are you expecting developers to stop using these components? That seems impractical today; what are the options when that is not possible? What is the actual problem this is trying to solve, so that we can identify appropriate workarounds?
V4.2: “Package repository contents are congruent to an authoritative point of origin for open source components” - what does “congruent” mean? Reword or define, this is unclear. I have no idea if this is good or not. V4.3: “Package repository requires strong authentication” - for what? To read? Probably not, most such package repositories allow anyone to read. I presume what’s meant is to modify (create, update, delete), but that’s not stated. Also: Many package repos won’t require this, and OWASP can’t make them. Instead, you could require that the packages YOU DOWNLOAD have strong authentication for modification; that’s far more practical. 4.11 - “Package repository provides auditability when components are updated” - it sounds good, but what does “auditability” mean in this context? 4.13/4.14 - what can projects do if their package manager doesn’t support these capabilities? I’m guessing that that they could augment their package manager with these functions and call the combination their “package manager” - yes? I don’t think all package managers support these functions, so you need to discuss how to handle these cases. 4.16 “Package manager validates TLS certificate chain to repository and fails securely when validation fails” - many organizations use HTTPS (TLS) intercepting proxies (and re-sign with their own certs). I am not a fan of this approach, but it’s widespread, and if you forbid it this is a no-go. This requirement needs to be rewritten or clarified so that it can work in such settings, or acknowledge that a large number of organizations will not be able to use this specification. E.g., change “to repository” into “to repository or that organization’s authorized proxy”. V4.18 “Package manager does not execute code” - I presume what’s meant is that it doesn’t execute code merely when it downloads the code (it should say that instead). I applaud the sentiment, but I think that’s probably a non-starter today. I believe many package managers still can execute code when packages are downloaded. At the language repo level, at least JavaScript/NPM, Python, and Ruby support execution (Source: “Typosquatting programming language package managers” by Nikolai Tschacher, 2016, https://incolumitas.com/2016/06/08/typosquatting-package-managers/ ). The RPM package system (used by Red Hat Enterprise Linux, CentOS, Fedora, and SuSE) includes %pre and %post sections that execute scripts. OWASP won’t be able to change whether or not packages execute programs on installation for a long time, if ever, because of backwards compatibility issues. People will just ignore impractical advice like “don’t use the normal package manager for your situation” - and ignore OWASP if it tries to enforce this. Bedies, other requirements in this document require a package manager. OWASP could recommend a flag or something, or more pragmatically, require builds in a safe sandbox (such as a container or VM) that restricts the potential impact of executing code.
V4: Should add “Anti-typosquatting measures are established when using public package repos by the project or repo” for level 3. Typosquatting is a big problem. Not all public package repos counter typosquatting; projects should have measures in place where the public repo does not.
Chapter V5:
V5.1 “Component can be analyzed with linters and/or static analysis tools” - that’s pointless, anything CAN be. Do you mean that the source code is available, so that source code analysis can be used? Do you mean that the source code is not obfuscated?
5.2 “Component is analyzed using linters and/or static analysis tools prior to use” and 5.3 “Linting and/or static analysis is performed with every upgrade of a component” - why bother? I can run some tools & throw away the results. Unless you require someone to do something (e.g., analyze those results to determine if there’s an unusual/unacceptable level of risk), this is a complete waste of time.
5.4: “An automated process of identifying all publicly disclosed vulnerabilities in third-party and open source components is used” - this just be “components” - first-party gets no free ride. This also seems to be a one-time thing; continuous monitoring is important too, as reports can happen later.
I don’t see any requirement to do anything when a publicly disclosed vulnerability is found; without that, why bother? There should be a triaging so that less-important and unexploitable vulnerabilities are deferred (ideally you fix everything instantly, but that’s impractical & rushing everything may increase the likelihood of making even worse mistakes). However, important exploitable vulnerabilities caused by a component DO need to addressed rapidly.
This section doesn’t begin to seriously require security analysis. Fuzzing, assurance cases, red teams, secure requirements / design / implementation, training of developers, and so on. That’s fine if the real focus is supply chain analysis, as noted earlier.
Chapter V6:
The terms “pedigree” and “provenance” are not defined adequately enough so that anyone would agree on what counts and what doesn’t count.
6.1 “Provenance of modified components is known and documented” - how far? E.g., I download a JavaScript package named X via NPM. Is that enough? After all, “it came from NPM”. If not, what IS enough?
6.2 - “Pedigree of component modification is documented and verifiable”. If I record “Fred modified it” & sign it, is that enough? It appears that the answer is “yes”; if that wasn’t intended, then what was intended needs to be defined. The glossary definition of Pedigree is “Data which describes the lineage and/or process for which software has been created or altered.” - and that’s too vague to be useful.
Chapter “Guidance: Open Source Policy”
Again, this isn’t a taxonomy.
Since this isn’t a requirement, should this be in the document at all? This should probably be in a separate document.
“All organizations that use open source software should have an open source policy” - this is a joke, right? Who uses software & doesn’t use OSS? Even proprietary software practically always has OSS embedded within it.
A policy that tries to declare “How many major or minor revisions old are acceptable” is probably impractical. There are too many variations in applications and components for that to make much sense.
Chapter Appendix A: Glossary
These definitions are too high-level. E.g., “Package manager” is “A distribution mechanism that makes software artifacts discoverable by requesters.” - so I guess the world wide web + Google is a package manager.
Chapter Appendix B: References: I’m surprised that NIST SP 800-161 (“Supply Chain Risk Management Practices for Federal Information Systems and Organizations”) wasn’t referenced. Have you examined it?
I didn’t see the “Open Trusted Technology Provider (O-TTPS)” standard referenced, “Open Trusted Technology Provider™ Standard – Mitigating Maliciously Tainted and Counterfeit Products (O-TTPS)”. It was developed by the Open Group, and is now ISO/IEC 20243:2015. That focuses more on “best practices for global supply chain security and the integrity of commercial off-the-shelf (COTS) information and communication technology (ICT) products” - but it should probably be mentioned & reviewed.
I suspect this should refer to OpenChain, and again, review it. https://wiki.linuxfoundation.org/_media/openchain/openchainspec-2.0.pdf
Consider adding the CII Best Practices badge in the references, and reviewing it. See: https://github.com/coreinfrastructure/best-practices-badge/blob/master/doc/criteria.md and https://github.com/coreinfrastructure/best-practices-badge/blob/master/doc/other.md
These comments are my own, but I hope they help. Good luck!