bytecodealliance / governance

Apache License 2.0
11 stars 37 forks source link

Hosted Project Proposal: rust-wasi-sample #113

Open yoshuawuyts opened 5 days ago

yoshuawuyts commented 5 days ago

Proposing the adoption of rust-wasi-sample as a Bytecode Alliance hosted project.

Repository URL: https://github.com/yoshuawuyts/wasi-rust-sample

This is represents a simple "hello world" HTTP component written in Rust. It is available as a GitHub template, has support for GitHub code spaces, and will automatically build and push new OCI images on release. Rust shouldn't be the only language we build samples for, but it is a good first language to start with. The sample isn't yet complete either, but by moving it into the BA we can expand the contributor base and elevate its visibility.

Requirements

Alignment with the Bytecode Alliance Mission

Projects must have alignment with the Bytecode Alliance mission:

Our mission is to provide state-of-the-art foundations to develop runtime environments and language toolchains where security, efficiency, and modularity can all coexist across a wide range of devices and architectures. We enable innovation in compilers, runtimes, and tooling, focusing on fine-grained sandboxing, capabilities-based security, modularity, and standards such as WebAssembly and WASI.

The Bytecode Alliance is a group with a specific mission, and we therefore will only sponsor projects that are in alignment with and further that mission. For example, project sponsorship is untenable if the project undermines sandboxing, security, or standardization efforts.

This sample supports these goals.

Code Review

Description

All projects must gate merging pull requests on code reviews that audit not only for style but also substance, such as whether security invariants are properly maintained by the new code.

It is recommended, but not required, that hosted projects maintain a CODEOWNERS file and automatically assign reviewers as well.

Code reviews have a demonstrable impact on the quality of source code by catching bugs early, determining the best possible implementation, and fostering trust within the community. Timely responses let contributors know that their work is valued and encourages further contribution.

We include a CODEOWNERS file and will respond to issues, PRs, and other user in line with the requirements stated here

Code of Conduct

All Bytecode Alliance projects must:

  • link to the Bytecode Alliance's Code of Conduct documents from a CODE_OF_CONDUCT.md file in root of the repository, and
  • enforce the codes of conduct among the community and contributors, or escalate to the Bytecode Alliance CoC Team, if needed.

Having a code of conduct is crucial for creating a positive and respectful environment in any organization, community, or group. It serves as a set of guidelines that outline expected behavior and ethical standards for all members involved.

We have adopted the BA CoC.

Continuous Integration Testing

All projects must run continuous integration (CI) tests on all pull requests and merges. Key project features must be covered by CI.

If any part of the CI gates on merging changes that is not reproducible by external contributors, then the project must make affordances to support those external contributors.

Implementing CI offers several benefits to software projects, helping ensure correctness and quality, making it an essential practice for modern software development.

We have CI setup, though we intend to improve if further. Part of the reason why we're upstreaming this sample is to collaborate and improve the standard Component flows - and that includes testing too. The fact that this takes some effort to do correctly is exactly what we're hoping to improve.

Contributor Documentation

All projects must have a CONTRIBUTING.md document in the root of their repository. This document must provide, or link to another form of project-specific documentation that provides, high-quality contributor documentation.

See "How to build a CONTRIBUTING.md" by the Mozilla Science Lab for more details on what a high-quality CONTRIBUTING.md file looks like.

A CONTRIBUTING.md serves as a guide for potential contributors, outlining the expectations for individuals who wish to contribute to the project. The Bytecode Alliance is a community-driven software foundation and documents like CONTRIBUTING.md are necessary for fostering community contributions.

We include a CONTRIBUTING.md.

Following the Bytecode Alliance Operational Principles

All projects must follow the Bytecode Alliance Operational Principles.

In pursuing our mission and vision, the Bytecode Alliance follows a set of operational principles aimed at keeping us aligned on three key aspects: what we want to create, how we want to work together, and how we want to work with others.

We follow the operating principles. Our intent with this sample is to establish a first language-specific sample, that can be replicated by other languages / interfaces too.

Licensing Compatible with the Bytecode Alliance

All projects must be licensed under the Apache 2.0 license with an LLVM exception. Exemptions may be granted by the board.

All projects must only use dependencies and third-party code licensed under one of the following open source licenses:

  • Apache-2.0 WITH LLVM-exception
  • Apache-2.0
  • BSD-2-Clause
  • BSD-3-Clause
  • ISC
  • MIT
  • MPL-2.0
  • OpenSSL
  • Unicode-DFS-2016
  • Zlib

All dependencies and third-party code must be properly attributed.

The source for all projects must be available to all members and must be available to all non-members under the same license.

All projects must automatically ensure that licensing requirements of dependencies are met in CI.

We strive to build an open community and a legally-compatible software ecosystem.

We've adopted the Apache-2.0 license with LLVM exception.

README

All hosted projects must have a README.md file in the root of the repository which begins with:

  • The project name and logo (if one exists)
  • A one-sentence description of the project
  • <strong>A <a href="https://bytecodealliance.org/">Bytecode Alliance</a> hosted project</strong>

The most important information about the project should be "above the fold". Projects should identify themselves as Bytecode Alliance projects so that, with time, people associate the Bytecode Alliance with quality projects that they can rely on.

We meet these requirements.

Release Process

Documentation of a release process that any project maintainer may execute to create a new release version of the software.

Multiple people must have permissions to publish releases. A github team must have access to publish packages and package ownership on the associated package repository when possible. For example a Rust project may have multiple owners on crates.io.

Projects and their releases shouldn't be tied to any single user's machine or keys to ensure continuity of the project. A project isn't an open, community project if only one person can publish releases.

Automation makes fewer mistakes than humans, and getting releases right is critical, since only releases are typically used downstream, not random commits from main.

Our sample includes automated releases - uploading components to registries is an important part of the core flow.

Security Process

All projects must have a documented security process for reporting and disclosing vulnerabilities, managing patches that fix vulnerabilities, and announcing and making available security releases. Furthermore, projects must actually follow their documented processes.

It is recommended that request Common Vulnerability and Exposure (CVE) numbers for discovered vulnerabilities and report the CVE when disclosing the vulnerability.

A tool like dependabot may suffice for hosted projects. Dependabot should be used for security updates only, and not apply all updates indiscriminantly. Updating dependencies should otherwise be done with intention (never automatically). Automatic creation of pull requests is acceptable, but manual review is required to prevent supply chain attacks.

Bytecode Alliance projects must be a secure foundation for others to build upon. Transparency and a managed security release process is key to being this foundation.

We have configured dependabot

Semantic Versioning

All projects must follow either standard semantic versioning or their ecosystem's local-dialect of semantic versioning (for example, Rust and cargo's interpretation of semantic versioning slightly differs from the standard, but is acceptable for Rust Bytecode Alliance projects).

A clear versioning scheme is necessary for end-users. We desire consistency across projects and so the Bytecode Alliance has adopted semantic versioning as a required best practice.

We follow semantic versioning.

Secrets Management

GitHub organization and repository level secrets should be used. Secrets must not be hard coded in source.

For secrets like passwords for the project's associated social media account, these should be stored in the password service paid for by the Bytecode Alliance. Contact the TSC for access and ability to manage a given secret.

Secure secret management is a requirement for a secure project. Additionally, projects and their associated accounts shouldn't be tied to any single user's machine or keys to ensure continuity of the project. A project isn't an open, community project if only one person can access its accounts.

We do not manage any secrets nor do we plan to.

Supply Chain Security

All projects must follow a well-documented process for updating dependencies and auditing them for malicious supply-chain attacks.

When applicable, projects should:

  • Integrate auditing tools in CI (such as cargo vet)
  • Use code review and static analysis tools on dependencies

Finally, projects must document and follow their process for responding to upstream vulnerabilities in dependencies.

Our mission of developing runtime environments and language toolchains where security, efficiency, and modularity can all coexist necessarily means that we have performed our due dilligence to mitigate software supply chain attacks.

We intend to show how to configure, store, and manage bill of materials - but that's not yet part of the MVP of the sample. We do intend to add this, with the purpose of educating Component authors how to manage SBOMs themselves.

Sustainable Contributor Base

All projects must have regular contributions from multiple contributors.

It is recommended that hosted projects additionally have contributors affiliated with at least two different Bytecode Alliance organizations and that the project's leadership has representation from at least two different Bytecode Alliance organizations.

There must not be any private information necessary to fully contribute to the project.

A project is not considered healthy with only one contributor. An open, community project requires input from multiple stakeholders and does not rely on a single person.

The TSC may waive the above contributor base requirements under certain conditions. In particular, the TSC may decide to adopt crucial upstream dependencies of existing Bytecode Alliance projects that are otherwise effectively unmaintained or only have a single maintainer.

This sample has multiple contributors - albeit all from a single company (Microsoft). We expect more people will contribute to this once it's upstreamed, as for example it shows some of the limitations of cargo-component. If we do it right, we should be able to fix those issues and simultaneously update the sample. Though this sample exists in a separate repo, it is heavily tied to the other projects in the BA.

Version Control

All projects must be hosted on the Bytecode Alliance Organization on GitHub.

Access controls are managed via the Bytecode Alliance organization on GitHub. This allows for continuity of the project when hosted in one place. Finally, this is the only way to reasonable manage the projects within the organization.

Once this project is accepted, we will move the project over.

Recommendations

Changelog

It is recommended that hosted projects highlight key additions, breaking changes, security fixes, and otherwise noteworthy changes in a changelog.

See keepachangelog.com for a recommended approach.

We are building an ecosystem that developers can depend on, and one small part of that is communicating important changes downstream.

We currently don't keep a changelog - though that's something that would be neat to automate as part of the release process. We agree that keeping changelogs is a good thing, and at a minimum want to make it easy for users of the sample to keep one - even if it's just a list of pull requests.

Continuous Fuzzing

Not all projects will necessarily benefit from fuzzing, for example benchmark suites. The TSC may choose lift this requirement for a particular project.

It is recommended that hosted projects have 24/7, round the clock, continuous fuzzing. The fuzzing should exercise significant amounts of the code base and test the project's most important properties, such as sandboxing. Bugs and vulnerabilities discovered via fuzzing should be addressed promptly.

Part of our open-source and open contribution model, the corpus and setup for running fuzzing should be open-sourced as part of the project.

Faults discovered via fuzzing must be reported privately to the project's core team so that the project's security vulnerability process can be followed properly, if necessary. For example, fuzzing infrastructure must not automatically open public issues for any fault that is discovered.

Continuous fuzzing is a valuable practice for projects, due to its significant benefits in improving security and reliability. Within the Bytecode Alliance, we host projects that provide a sandbox. The fidelity of these sandboxes must be battle-tested via a number of methodologies including automated fuzzing.

This is a sample showing how to use other tools. We wouldn't benefit much from directly fuzzing 24/7, instead it seems better to transitively rely on the projects we are showcasing being thoroughly tested and fuzzed.

End-User Documentation

We abide by the OpenSSF requirements for documentation:

The documentation of an external interface explains to an end-user or developer how to use it. This would include its application program interface (API) if the software has one. If it is a library, document the major classes/types and methods/functions that can be called. If it is a web application, define its URL interface (often its REST interface). If it is a command-line interface, document the parameters and options it supports. In many cases it's best if most of this documentation is automatically generated, so that this documentation stays synchronized with the software as it changes, but this isn't required. The project MAY use hypertext links to non-project material as documentation. Documentation MAY be automatically generated (where practical this is often the best way to do so).

Furthermore, we identify a few different types of (sometimes overlapping) documentation:

  • API documentation: Documentation for each type, method, function, and module in a library.
  • Architectural overviews: High-level documentation about the architecture of the project and how it works from a 1000-foot view that helps endusers take advantage of the project in the best way possible and helps onboard new contributors.
  • Examples: Code examples that show off how to use the project as a whole or particular features it supports.
  • Guides and tutorials: Long-form prose, with code samples interspersed, that shows how to accomplish a task using the project.

API and CLI flag documentation is required for hosted projects; all other types are recommended.

Documentation is necessary for end-users to productively use the project; source code comments are not sufficient.

This project itself is documentation in the form of both code and prose.

Issue Triage Process

Hosted projects must use an issue tracker for tracking individual issues.

It is recommended that hosted projects should additionally have a documented process for expeditiously triaging incoming issues and pull requests, and follow that process. Contributors should get prompt responses to their issues and pull requests, even if a response is not an immediate fix or review.

For a successful community-driven project, expedient communication within issues and PRs encourages further collaboration and contribution.

We have an issue tracker.

Leverage the Bytecode Alliance RFC Process

A request for comments (RFC) is a technique for soliciting the community and contributors for feedback on proposed major changes and decisions.

It is recommended that hosted projects follow the Bytecode Alliance RFC process for changes that significantly affect project stakeholders or contributors. The RFCs repo describes when an RFC is needed in more detail:

Many changes to Bytecode Alliance projects can and should happen through every-day GitHub processes: issues and pull requests. An RFC is warranted when:

  • The work involves changes that will significantly affect stakeholders or project contributors. Each project may provide more specific guidance. Examples include:
    • Major architectural changes
    • Major new features
    • Simple changes that have significant downstream impact
    • Changes that could affect guarantees or level of support, e.g. removing support for a target platform
    • Changes that could affect mission alignment, e.g. by changing properties of the security model
  • The work is substantial and you want to get early feedback on your approach.

This is a best practice for aligning contributors, the community, and downstream projects' needs with proposed technical implementations.

TODO: discussion of this recommendation and any supporting evidence (such as links to code, documentation, issues, and pull requests)

Production Use

It is recommended that hosted projects have demonstrated use in production by at least three independent organizations which are, in the TSC's judgement, of adequate quality and scope.

It is recommended that projects track production usage by organizations in an ADOPTERS.md at the root of the project, for example see ADOPTERS.md in Wasmtime.

Projects should demonstrate that they are practical, useful, and reliable enough to use in production.

The purpose of this sample is to increase the production use of existing projects.

Public Project Meetings and Notes

It is recommended that hosted projects hold regular and public project meetings. Meeting times and frequency should be advertised publicly, for example in the project's CONTRIBUTING.md. To avoid spam and "Zoom bombing", the video conferencing link need not be public, but should be available upon request.

Agendas for upcoming meetings and notes from past meetings should be published publicly. The notes should be in the bytecodealliance/meetings repository.

Public meetings encourage open communication, collaboration, and engagement within the project's community. Notes allow community members who were not present to remain aligned and can document any decisions made during the meeting.

Samples probably don't need their own individual project groups, as they intend to reflect the usage and best practices of other, existing tools. Those tools have their own meetings and logs, and we expect most of the conversations and decisions to be made in those groups - and only once completed will those changes be represented in the sample.

Sanitizers and Code Analysis

Static and dynamic code analysis tools (such as valgrind or miri) where applicable are recommended to be used by hosted projects.

It is recommended that hosted projects with non-trivial amounts of unsafe code (e.g. unsafe in Rust or any C/C++) run tests and fuzzers with the relevant sanitizers: Address Sanitizer, Memory Sanitizer, Thread Sanitizer, etc.

Automated code analysis is key to meeting our mission of developing runtime environments and language toolchains where security, efficiency, and modularity can all coexist.

We do not use unsafe code, but we do apply various other static analysis tools such as rustfmt and clippy.

pchickey commented 5 days ago

Since this is wasi-http specific could we rename it to rust-wasi-http-sample?

yoshuawuyts commented 5 days ago

Including the name of the world makes sense to me. I mainly want to make sure we also set ourselves up to host samples for Go, C#, JS, etc. If we generalize that to a scheme, it sounds like that might become something like: {language}-{world}-sample.

pchickey commented 5 days ago

Do we need a separate project / repo for each of those? Or can we maintain a single samples repo?

yoshuawuyts commented 5 days ago

This sample is setup as a GitHub Template: that makes it a single click to start modifying the sample to build your own. I think that's incredibly valuable, and we can't do that well if we put all samples in a monorepo. I think it also makes the sample feel more targeted/realistic if it's language-specific.

pchickey commented 5 days ago

Ah, ok, I didn't understand that Template was limited in that way, which is sorta a bummer but I guess it makes sense.

yoshuawuyts commented 5 days ago

By the way, I'm not sure if this was clear from the description, but I hope that on the hosting side individual host projects will end up creating their own samples to run these applications. E.g. I think it'd be great if there are dedicated samples for {spin, wasmcloud} on {AWS, Azure, Gcloud} and so on. If we can link to these the getting started flow can just become:

  1. pick your language
  2. pick your hosting platform
  3. clone both templates
  4. you're now off to the races :)

With colleagues at Azure we're currently also working on an initial sample for running Wasm HTTP Components on AKS, which should provide an initial end-to-end flow people can use.

tschneidereit commented 4 days ago

Thank you for the very kind offer to contribute this project to the BA! ❤️ I (for now personally, not speaking on behalf of the TSC) think this would be a great addition.

Organization-wise, I agree with Pat that perhaps this doesn't need to be its own project. The key bit is that BA projects don't have a 1:1 mapping to repositories: a single project can span multiple repos, and a single repo can contain multiple projects.

Given this, would you perhaps be up for generalizing the proposed project definition here a bit to make it describe an umbrella project? A potential idea could also be to bring this up with SIG-Documentation to see if there's interest in helping maintain sample projects.