Setup crates publishing

greenhat commented 4 months ago

Goal

Set up all relevant crates publishing to crates.io in an automated or at least in a semi-automated manner. All dependent crates should be published as well. Changelog(s) should be generated automatically from the commit messages.

If the publishing process should be run manually it should have a minimal friction, ideally, running one script to publish all creates or at least a detailed instruction. If the publishing processes can be run on CI figure out what event should trigger it. The GitHub release creation event is way too broad.

Implementation

I had a superb experience with cargo-release but I have not used its automated changelog generation feature.

### Tasks
- [ ] https://github.com/0xPolygonMiden/compiler/issues/215
- [ ] https://github.com/0xPolygonMiden/compiler/issues/216

greenhat commented 2 months ago

@bitwalker I was looking for a tool that will allow us to implement the following workflow:

Trigger the new release initiation process;
Determine the changed crates since the last release and bump their versions;
Automatically generate changelog from commit messages;
If needed, manually edit the changelog, crate versions, etc.;
Publish the new crate versions to crates.io;

Since cargo-release does not take care of the 1-st and 4-th steps, I went on some exploration and found the following tools:

release-please Cons:

3rd step is very opinionated and linear git history is somewhat required see
4th step is done with commands via the commit messages see
5th step is opinionated, see;

'release-plz' Cons:

Although it has plenty of features, and it is very well documented, its bus factor is 1 (one active contributor);

The example of the release workflow with release-plz can be found here, and it implements the steps above in the following way:

Trigger the release process by creating the special release branch (with a suffix release-plz- in the branch name) or set automatic PR creation on every commit to the main branch;
The release PR contains the updated changelog, crate versions;
The changelog is generated from the commit messages via configurable git-cliff;
The manual editing of the changelog, crate versions, etc. is done via the commits to the release PR branch;
When the release PR is merged, the new crate versions are published to crates.io by release-plz.

I especially like the automatic release PR creation on every commit to the main branch, since it puts a release "one merge PR button" away and might be suitable for some of our repos. This is probably the lowest friction possible for the release process. The manual (create PR) mode is better suited for projects that does not want to release on every commit to the main branch.

EDIT: I'd like to try release-plz and see how does it fit our needs.

greenhat commented 2 months ago

@bobbinth We want to try and setup release-plz for automated releasing and publishing our crates.

Could you please enable the permission for GitHub actions to create and approve pull requests as described in the docs ? Actually, we're only going to need to create pull requests permission, but it seems to be no way to enable create and approve pull requests permission separately.

To set up the publishing on CI we need a crates.io API key with publish-new and publish-update permissions to publish the crate as described in docs. Since the crates we want to publish have never been published before, the first publishing will create the crate's accounts on crates.io. Do you want to publish the crates with your account or with the organization account? Or I can publish the first version myself and then invite you and Paul.

bobbinth commented 2 months ago

Could you please enable the permission for GitHub actions to create and approve pull requests as described in the docs ? Actually, we're only going to need to create pull requests permission, but it seems to be no way to enable create and approve pull requests permission separately.

Hmmm - I tried to do that but seems like this option is disabled:

This may be some org-wide setting. We'll need to clarify this with our security team.

To set up the publishing on CI we need a crates.io API key with publish-new and publish-update permissions to publish the crate as described in docs. Since the crates we want to publish have never been published before, the first publishing will create the crate's accounts on crates.io. Do you want to publish the crates with your account or with the organization account? Or I can publish the first version myself and then invite you and Paul.

I've previously had it set up for other projects so that any member of a given team can publish updates. It probably makes sense to do the same here too (e.g., anyone on the compilers team will be able to publish). I just need to remember how I did it :)

For the first publish, I can create placeholder crates (e.g., with version 0.0.0) but I'd need the list of crates which we want to create.

greenhat commented 1 month ago

Could you please enable the permission for GitHub actions to create and approve pull requests as described in the docs ? Actually, we're only going to need to create pull requests permission, but it seems to be no way to enable create and approve pull requests permission separately.

Hmmm - I tried to do that but seems like this option is disabled:

This may be some org-wide setting. We'll need to clarify this with our security team.

It's not a showstopper, we're just going to need to create a release PR manually.

To set up the publishing on CI we need a crates.io API key with publish-new and publish-update permissions to publish the crate as described in docs. Since the crates we want to publish have never been published before, the first publishing will create the crate's accounts on crates.io. Do you want to publish the crates with your account or with the organization account? Or I can publish the first version myself and then invite you and Paul.

I've previously had it set up for other projects so that any member of a given team can publish updates. It probably makes sense to do the same here too (e.g., anyone on the compilers team will be able to publish). I just need to remember how I did it :)

For the first publish, I can create placeholder crates (e.g., with version 0.0.0) but I'd need the list of crates which we want to create.

Thank you! Here is a list of all the new crates that we need to publish to release the cargo extension:

miden-codegen-masm
miden-hir
miden-hir-analysis
miden-hir-macros
miden-hir-symbol
miden-hir-transform
miden-hir-type
miden-frontend-wasm
midenc-compile
midenc-driver
midenc-session
cargo-miden

bobbinth commented 1 month ago

I've published a placeholder crate for miden-hir to crates.io and added permissions to enabling publishing for anyone on the compilers team (was actually simpler then I thought as they now allow doing this via the website). Could you check if all works as expected (maybe by publishing v0.0.1)?

A couple of comments for other crates:

Lets add distinct descriptions to Cargo.toml file for all the crates.
Also, let's add a simple README file to all crates (could be just one or two sentences explaining the purpose of the crate + link to the LICENSE file - see how we do it in Miden VM/base etc.)
In Cargo.toml files, let's change authors from "Miden team" to "Miden contributors".

A couple questions about crate naming:

Should miden-codegen-wasm be midenc-codegen-wasm?
Similarly, should miden-frontend-wasm be midenc-frontend-wasm?

Basically, I'm trying to see if we should make the names a bit more specific to the compiler project.

greenhat commented 1 month ago

I've published a placeholder crate for miden-hir to crates.io and added permissions to enabling publishing for anyone on the compilers team (was actually simpler then I thought as they now allow doing this via the website). Could you check if all works as expected (maybe by publishing v0.0.1)?

Thanks! miden-hir crate depends on unpublished crates, so we cannot publish it until all dependencies are published. miden-hir-type crate has no dependencies. If you could publish a placeholder crate for it, I could then publish it for testing purposes.

A couple of comments for other crates:

Lets add distinct descriptions to Cargo.toml file for all the crates.

Also, let's add a simple README file to all crates (could be just one or two sentences explaining the purpose of the crate + link to the LICENSE file - see how we do it in Miden VM/base etc.)

In Cargo.toml files, let's change authors from "Miden team" to "Miden contributors".

Sure.

A couple questions about crate naming:

Should miden-codegen-wasm be midenc-codegen-wasm?

Similarly, should miden-frontend-wasm be midenc-frontend-wasm?

Basically, I'm trying to see if we should make the names a bit more specific to the compiler project.

I'm in favor of prefixing all the compiler crates with a specific name instead of miden. One thing that already bit me a couple of times with midenc is that I have mistaken midenc with miden when referencing a dependency. But if all compiler crates have midenc prefix, this should not be a problem. Alternatively, we can go with the miden-compiler prefix.

bobbinth commented 1 month ago

miden-hir-type crate has no dependencies. If you could publish a placeholder crate for it, I could then publish it for testing purposes.

Done!

I'm in favor of prefixing all the compiler crates with a specific name instead of miden. One thing that already bit me a couple of times with midenc is that I have mistaken midenc with miden when referencing a dependency. But if all compiler crates have midenc prefix, this should not be a problem. Alternatively, we can go with the miden-compiler prefix.

I don't really have a strong preference here (e.g., using miden-hir for some crates and midenc for others seems OK to me, but also I'm fine with more radical changes) - so, just let me know where you and @bitwalker land on these names and I'll publish the placeholder crates.

greenhat commented 1 month ago

miden-hir-type crate has no dependencies. If you could publish a placeholder crate for it, I could then publish it for testing purposes.

Done!

Thanks! I published a test version at https://crates.io/crates/miden-hir-type using the newly generated crates.io token. So the publishing credentials are working.

@bitwalker Lets finalize the crate names so that @bobbinth could publish the placeholders. I'm fine with prefixing all compiler crates with midenc or miden-compiler.

bitwalker commented 1 month ago

@greenhat @bobbinth let's use midenc as the prefix for all compiler crates, and otherwise stick to the current naming scheme (e.g. midenc-hir-* for IR-related crates, midenc-codegen for codegen backends, and midenc-frontend-* for language frontends).

bobbinth commented 1 month ago

To confirm, I'll create the following crates:

midenc-codegen-masm
midenc-compile
midenc-driver
midenc-frontend-wasm
midenc-hir
midenc-hir-analysis
midenc-hir-macros
midenc-hir-symbol
midenc-hir-transform
midenc-hir-type
midenc-session
cargo-miden

bitwalker commented 1 month ago

To confirm, I'll create the following crates:


midenc-codegen-masm

midenc-compile

midenc-driver

midenc-frontend-wasm

midenc-hir

midenc-hir-analysis

midenc-hir-macros

midenc-hir-symbol

midenc-hir-transform

midenc-hir-type

midenc-session

cargo-miden

Yep, as well as midenc so we can cargo install the compiler executable - not sure if that will end up being viable long term as an installation method, but for now it should work.

bobbinth commented 1 month ago

Seems like there is a limit to how many new crates one can publish in a given period of time. So far, I've published 6 crates and will publish the remaining 7 tomorrow.

bobbinth commented 1 month ago

Actually, I was able to publish all the crates (the restriction was that I could publish one crate every 10 mins) - so, everything should be good to go on this front.

greenhat commented 1 month ago

A couple of comments for other crates:

Lets add distinct descriptions to Cargo.toml file for all the crates.

Also, let's add a simple README file to all crates (could be just one or two sentences explaining the purpose of the crate + link to the LICENSE file - see how we do it in Miden VM/base etc.)

Of all the published crates, only the cargo extension crate is intended for public use. All other crates are actually our internal crates and will be published out of necessity, and we would not want to encourage their public use. @bitwalker WDYT?

bitwalker commented 1 month ago

@greenhat Agreed, I'd prefer to have no README for any of the crates we don't intend for users to depend on directly. This is pretty standard for other projects that publish their whole workspace to crates.io (e.g. cranelift, wasmtime). We can always add them later, but for the time being I'd prefer to use this as a signal that those crates are not intended for direct consumption.

I'm fine with adding short descriptions of their purpose though, that seems useful regardless.

bobbinth commented 1 month ago

I guess an alternative could be a small readme with a single sentence saying something like - this crate is a part of miden-compiler project. But maybe that's not necessary as there will be a link to the repo anyways.

greenhat commented 1 month ago

Besides compiler crates and cargo extension, we have the Miden SDK crates that needs to be published as well. They temporarily reside in this repo to quickly iterate on ABI transformation, which is split between the SDK crates (Rust bindings) and the compiler (generate IR for ABI transformation in the frontend). Eventually, they will be stable enough to be moved to their own repo. The crates are (in sdk directory):

miden-sdk
miden-prelude (Miden stdlib)
miden-sdk-tx-kernel

miden-sdk is the umbrella crate that takes the other two as dependencies and exports everything needed to write a code that targets the rollup (account, note script, etc.). miden-prelude is the Rust bindings for the stdlib, e.g. types and functions for the Miden standard library. It exports everything needed to write a Miden VM program, without the rollup-specific parts. The miden-stdlib crate name is already taken by the VM stdlib implementation, so I took the next best name. miden-sdk-tx-kernel is the Rust bindings for the transaction kernel. It's not intended to be used directly by the user, but rather by the miden-sdk crate.

@bitwalker @bobbinth Let's finalize the crate names and publish them as well.

bitwalker commented 1 month ago

@bobbinth Wanted to share my thoughts on how we should approach handling Miden Assembly components (i.e. crates which are derived from/contain Miden Assembly sources). If you are in agreement, I can open a proposal in the miden-vm repo:

Essentially, the structure of such components would look like so:

miden-<component_name>-sys would be similar to how miden-stdlib is structured today, i.e. it contains the MASM sources for the component, provides a test suite which executes low-level tests for the component, and has a build script which generates documentation and the raw low-level Rust bindings and other metadata corresponding to the MASM code.
miden-<component_name> would provide higher-level Rust abstractions on top of the -sys crate, along with tests against those abstractions. It's purpose is to provide a more natural/ergonomic Rust interface to the low-level Miden Assembly APIs.

Most users would depend on, and code against, the latter crate; but it would also be possible to depend directly on the -sys crate if desired. This setup matches the idiomatic architecture for binding to libraries written in other languages, so it would be intuitive for Rust devs, and more generally it provides a clear boundary for where to put things (if it is generated from Miden Assembly, it goes in the -sys, otherwise it goes in the higher level crate), and is a nice template for all such Miden Assembly-derived components IMO.

I think we should start by replacing the current miden-stdlib crate with a pair of crates which follow this template. I think it could also be a good idea to move them to a new repo/workspace, but that's less important in my mind. The current miden-stdlib crate, aside from providing some of the functions I mentioned above, solely exists to export an implementation of the Library trait, which IMO can go away entirely, since the concept of MASL libraries is being superseded by packaging soon anyway. I don't believe it belongs in the "new" miden-stdlib(-sys) crates described above though, so if we need similar functionality, it is probably worth discussing that as a separate issue.

Assuming we do what I've described above, then miden-prelude goes away (the miden-stdlib crate would fulfill that purpose). The miden-sdk-tx-kernel would be "merged" into miden-sdk (or more precisely, miden-sdk would re-export the transaction kernel abstractions from a miden-tx-kernel crate, or whatever the equivalent to miden-stdlib is for the transaction kernel). Ultimately, we only need one crate for the SDK, miden-sdk, which would depend on, and re-export, elements defined across various MASM component crates, such as the stdlib.

The one detail I'm omitting here, is that eventually we'd want to generate the -sys crate bindings from the MASM code, but in the near term we'd be hand-writing those bindings. But until we have some way of associating higher level type information with MASM procedures, we don't have a good way to automatically generate good bindings.

bobbinth commented 1 month ago

Overall, I think this makes sense. I do have a couple of questions:

miden-<component_name>-sys crate would probably not contain MASM code but binary MAST (in the format described in #132) - right? I'm imagining that MASM source would be in the repo but it would be compiled into MAST before publishing so that each version of the crate is tied to a specific set of MAST roots. a. If we do publish the compiled MAST, would we publish 2 separate versions? One compiled in debug mode and the other one in "release" mode?
I'm trying to understand what would be the difference between "raw low-level Rust bindings" published with the sys crate vs. "higher-level Rust abstractions". Is it basically that low-level bindings would be just a collection of functions while higher-level abstractions may have structs, enums etc.? Or is it something else? a. One reason for the question is that I'm wondering if any Rust bindings would go into the sys crate at all. Maybe the sys crate contains MAST code + some metadata, but all Rust code goes into the non-sys crate? b. Related to the above, would we always be able to generate Rust bindings for MASM procedures? I'm imagining that sometimes we may need wrapper procedures. In such cases, would wrapper MASM/MAST code go into the non-sys crate?
Regarding moving miden-stdlib into its own repo: we actually had an issue for this (0xPolygonMiden/miden-vm#723) but closed it because it was a bit more challenging than I originally thought (miden-stdlib is a bit more tightly coupled with the VM than I'd like). But I don't mind revisiting this as I do think this is the right way forward.

bitwalker commented 1 month ago

I would expect the build script for the crate to produce MAST from the MASM sources that live in the source tree as the Rust code for the crate (essentially how miden-stdlib is organized today), by invoking the assembler (i.e. add miden-assembly as a build-dependency, it would not be necessary to have all of these crates in the same worktree as the VM. a. If for some reason we need to publish MAST with the crate, then it should be the release version IMO, but we'd also publish debug info and other optional things, which can be used or ignored.
The relationship between the two is the same as, for example, libgit2-sys and git2. The former literally exports 1:1 the API of the libgit2 shared library (written in C), in Rust - it is low-level, does not provide any idiomatic Rust conveniences like iterators. The git2 library depends on libgit2-sys, wrapping it, and provides all of those idiomatic Rust conveniences on top of the low-level libgit2 APIs. So, for example, imagine you want to query a repo for all of its tags. That looks like the following using libgit2-sys:

use std::ffi::CStr;
use std::mem::MaybeUninit;

pub fn tags() -> Vec<String> {
    let path = CStr::from_bytes_with_nul(b"/repos/compiler\0");
    let mut repo = MaybeUninit::uninit();
    assert_eq!(libgit2_sys::git_repository_open(repo.as_mut_ptr(), path.as_ptr()), 0, "could not open repo");
    let callback = Some(list_all);
    let mut payload = Payload {
        tags: vec![],
        repo: repo.as_mut_ptr(),
    };
    libgit2_sys::git_tag_foreach(repo.as_mut_ptr(), callback, &mut payload as *mut Payload as *mut _);
    payload.tags
}

extern "C" fn list_all(_: *const i8, oid: *mut libgit2_sys::git_oid, payload: *mut c_void) -> i32 {
    let mut payload = unsafe { &mut *(payload as *mut Payload) };
    let tag = MaybeUninit::uninit();
    libgit2_sys::git_tag_lookup(tag.as_mut_ptr(), payload.repo, oid);
    let tag = unsafe { MaybeUninit::assume_init(tag) };
    let name = CStr::from_ptr(libgit2_sys::git_tag_name(tag));
    payload.tags.push(String::from_utf8_lossy(name.to_bytes()));
    0
}

struct Payload {
    tags: Vec<String>,
    repo: *mut libgit2_sys::git_repository,
}

And using git2:

let repo = git2::Repository::open("/repos/compiler").expect("could not open repo");
let tags = repo.tag_names(None).expect("error fetching tags");
for name in tags {
    // ...
}

That essentially illustrates the difference between the two crates: libgit2-sys brings the libgit2 API into Rust, and that's all it does (typically the -sys crate also handles finding and/or building the library it wraps as part of its build script). The higher level crate (i.e the one Rust users are going to interface with and care about), builds a nice Rust-native interface on top of the low level library. This pattern permits alternative implementations on top of the low-level library, without having to reimplement it.

continued a. By convention the -sys crates are Rust binding crates, so I would argue this is the most appropriate place for generated Rust bindings to MASM procedures b. This would depend on how we approach generation of bindings, but if something is not directly representable in Rust, we will need to define a wrapper in MASM that is representable in Rust, so that we can emit bindings to that procedure instead. Fundamentally, a binding generator will depend on us having the necessary metadata to emit Rust function signatures that correspond to MASM procedure "signatures" (which are not currently represented, but would need to be). In the meantime, the bindings would be maintained by hand.
Yeah I was anticipating that we'd need to do some work to make the VM not dependent on miden-stdlib directly, so we could make the other changes while keeping it in the miden-vm repo initially, then move it out when the time is right.

bobbinth commented 1 month ago

I would expect the build script for the crate to produce MAST from the MASM sources that live in the source tree as the Rust code for the crate (essentially how miden-stdlib is organized today), by invoking the assembler (i.e. add miden-assembly as a build-dependency, it would not be necessary to have all of these crates in the same worktree as the VM.

The main thing I'm trying to think through here is whether this would guarantee that MAST roots for a given version of the library do not change. For example, let's say we have lib_a and lib_b such that lib_b depends on lib_a. Developer of lib_b publishes v0.1.0 of their library assuming it is built with lib_a v0.2.0. But then the developer of lib_a publishes v0.2.1 of lib_a with some optimizations (no breaking changes from the API standpoint) and now MAST roots for lib_b v0.1.0 would be different from the originally published MAST roots.

I guess we can prevent the above by recommending the following conventions:

Any release which changes existing MAST roots of a library should be considered a breaking release. Only adding new MAST roots should be considered a non-breaking release.
All MASM library dependencies must be exact (e.g., specified as =0.2.0). This probably applies to the Miden Assembler dependency as well.

bitwalker commented 1 month ago

The main thing I'm trying to think through here is whether this would guarantee that MAST roots for a given version of the library do not change.

Thinking through things a bit more, I think we would have two separate builds for things like the standard library (i.e. I'm not talking about packages containing accounts/note scripts/etc.):

The build which produces a Miden package, with all necessary metadata we'd need downstream to bind against the procedures in the package. The build output could then be distributed via centralized repo, or some other means.
(Optional) The Rust crate, whose build script fetches, unpacks, and generates bindings for the package produced in 1. In cases where automatic generation is not possible, the bindings would be hand written, but in either case, would reference the MAST roots for the specific version of the Miden package that it is targeting. Different releases of the Rust crate do not necessarily result in different MAST roots for the underlying procedures, unless the Miden package version was bumped. For reasons I'll get to shortly, I believe it is essentially irrelevant whether the MAST roots for the standard library changes or not, but in any case, control over those changes would be more intentional as a result of this build strategy.

In theory, if our bindings generator is general enough, we could eschew even publishing a Rust crate at all, and just reference the Miden package, from which the bindings would be generated. The dependency resolution for Miden packages is planned to be based on the package digest, rather than a semver scheme (at least, that's my assumption for the time being), so if you target a specific version of the package, you'll always get exactly that version.

For example, let's say we have lib_a and lib_b such that lib_b depends on lib_a. Developer of lib_b publishes v0.1.0 of their library assuming it is built with lib_a v0.2.0. But then the developer of lib_a publishes v0.2.1 of lib_a with some optimizations (no breaking changes from the API standpoint) and now MAST roots for lib_b v0.1.0 would be different from the originally published MAST roots.

To be clear, crates.io is not the distribution mechanism for Miden packages, and AFAIK, we are not planning to use a semantic versioning scheme for Miden packages, only content digests. But even if we were to use crates.io for distribution of the Rust bindings for some Miden package, the actual Miden package would have to be already built/published beforehand, so that the symbols of the Rust bindings reference the specific MAST roots exported by that package. In such cases, if the author of lib_a did a patch release of their Rust bindings, while simultaneously making a breaking change, like changing to a new release of the lib_a Miden package that is backwards-incompatible, then yes, that could cause issues, but those are not issues we can solve - in this instance, the author of lib_a violated the semantic versioning contract, by making a breaking change, while claiming the new version has no breaking changes.

That kind of issue is the social aspect of versioning in general though - unless you manually review every update to every package in your dependency tree, it is always possible for someone to make a breaking change while claiming that the update is backwards-incompatible. Even if you are referencing your dependencies by digest this is true, "digest D2 is the new version of lib_a, which the author says is a patch release, so it should be backwards compatible with digest D1, the version I'm currently depending on, so I'm going to upgrade to D2", unless you've reviewed all of the changes in D1->D2, you don't know that the author didn't make a breaking change on accident/purpose/whatever.

I'm not sure what the ideal solution to all of this is for Miden, but at least in general, I think we need to differentiate between stuff published to crates.io for Rust projects, and Miden packages (containing compiled MAST + metadata) published/distributed via some other means.

More generally, my "grand unified theory" for all of this is based on the following premises:

The VM does not ship any libraries out of the box, or assume any specific libraries are available, or what the versions of those are
The VM has some host-configurable way to fetch Miden packages on demand (see 0xPolygonMiden/miden-vm#1226). The VM could ship with a default configuration which is set up such that all core Miden libraries can be loaded without any changes, but this would be an optional step.
Miden packages contain the following contents, at a minimum:
- MAST for code local to the package (i.e. it does not contain MAST for its dependencies)
- An interface descriptor for the APIs exported by the package, i.e. what procedures are available, the MAST roots of those procedures, how they expect to be called, and some kind of type signature (TBD)
- A dependency mapping, that specifies the digests of all dependencies referenced by the package, and where they were sourced from as a URI
When a Miden package is loaded by the VM, its transitive dependency tree is also loaded, to the extent allowed by the configuration, and the metadata associated with the package. Failing to load the entire dependency tree is not a hard error, instead, if during execution, a MAST root is referenced for which we have no metadata, or which we were unable to load, only then does it become a hard error.
Rust crates which wrap some MASM code, must produce a Miden package for the MASM code first, and derive Rust bindings from that package. A crate that is generated from bindings, will by convention use the -sys suffix. Upstream crates which add Rust-native abstractions on top of the bindings, will depend on the Miden package via the corresponding -sys crate.
A Rust crate which wraps a Miden package, must include any changes to the underlying Miden package version, in its semantic versioning scheme (i.e. if a change to the underlying Miden package includes a breaking change, the Rust crate version must reflect this via a major version bump).
Where possible, we will prefer to emit Rust bindings directly from a Miden package, without requiring a dependency on a Rust crate. This is the default mode of operation for the Wasm component model toolchain, so to the extent that we can generalize it to any Miden package, we should do so.
Miden packages produced from Rust code, will be distributed the same as if they were produced from MASM, i.e. it shouldn't be relevant what language a Miden package was compiled from. To the extent that additional libraries and such are needed on a per-language basis, those dependencies would be defined and distributed via the language dependency ecosystem, not the Miden package ecosystem.

To reiterate - I think it is a bad idea to have "privileged" libraries that are shipped with the VM. It makes shipping VM updates more complicated, makes it impossible to ship updates/fixes to the standard library without a new release of the VM, and causes a slew of issues with publishing code as a third party:

The contract code you publish now only works on a specific VM release containing the same exact versions of the standard library procedures you compiled against.
There is no way to produce contract code that works across multiple VM releases, unless the host specifically preloads multiple standard library versions for you.

More to the point, the idea that we could ship the standard library with the VM was based on a world where we were shipping MASM and compiling it on the fly - as a result, the same version of the standard library would always be used. However, if we are shipping MAST, then rather than saying "we will always compile your code with version X of the standard library", we are saying "we will only execute your code if you compiled against version X of the standard library" - a much less desirable (or useful) guarantee.

So I guess to bring my rambling to a close here, I think this all hinges on providing a standardized way for the VM to be able to fetch code on demand, configurable by the VM host to enable/disable various methods as they see fit. Once that exists, it no longer matters what version of the standard library was used by a given package, and in fact multiple versions can be in use simultaneously without conflict. To me, this is one of the major compelling reasons behind the "content-addressable store" idea for MAST code.

As an aside, I think I got a bit sidetracked by all of the above, but hopefully it is clear that what I'm suggesting is that we try to lay the groundwork for what I've described above, by using the Miden standard library as a blueprint. As such, we need a few things first, before we can do some of what I've suggested:

We need to determine how to express Miden procedures in an interface descriptor language like WIT (doesn't have to be WIT, but it needs to be something that can be converted to/from higher-level languages)
We need to pin down the package format details
We need to set up a build for miden-stdlib to produce a Miden package file
We need to write the Rust bindings for a specific version of the Miden package
Not a blocker, but longer term, we need to write a bindings generator that can consume a Miden package, and emit Rust bindings for it. This would be useful for other libraries like miden-stdlib, which wish to bind hand-written MASM into Rust; but more importantly, would remove the maintenance burden of manually reviewing the Rust bindings after each change to the underlying MASM code.

In the near term, building MAST via the build script of miden-stdlib is a temporary solution until we have implemented the packaging infrastructure. Doing so allows us to build a toolchain for Rust that does not rely on hand-maintained mappings of "well-known" procedure names to the MAST roots we've decided to map them to. Once Miden packages are being produced, we would naturally switch the build script of miden-stdlib to generate bindings from that instead.

bobbinth commented 1 month ago

I think it is a bad idea to have "privileged" libraries that are shipped with the VM

Totally agree with this.

the idea that we could ship the standard library with the VM was based on a world where we were shipping MASM and compiling it on the fly - as a result, the same version of the standard library would always be used.

I don't think that's currently happening. That is, we do not ship VM with Miden stdlib. In fact, the VM does not have any concepts of the libraries at all. Libraries are used by the assembler at assembly time, but stdlib is not shipped by default and in that sense, it is no different from any other library.

When I mentioned that there is some tight coupling between the VM and stdlib, I meant that there are features enabled in the VM (primarily a set of advice injectors) that are created specifically to support some functionality in stdlib.

To be clear, crates.io is not the distribution mechanism for Miden packages,

I agree that that's the goal - but I was thinking we could piggyback on crates.io at least in the short term. Basically, we'd use similar mechanism to what we are using now with stdlib and midenlib just instead of bundling MASM with the crates, we'd bundle MAST. This is definitely not an ideal solution, but should be simple enough and may suffice as a stopgap until we have a better solution in place.

The VM does not ship any libraries out of the box, or assume any specific libraries are available, or what the versions of those are

Agreed (and as mentioned above, that's how things work now as well).

The VM has some host-configurable way to fetch Miden packages on demand (see Implement extensible subsystem for on-demand storage/provisioning of MAST objects miden-vm#1226). The VM could ship with a default configuration which is set up such that all core Miden libraries can be loaded without any changes, but this would be an optional step.

The way I'm thinking about this is that we may have several implementations of the ObjectStore interface. We may start out with a relatively simple implementation (which may rely on crates.io for distribution, but other options are possible too), and over time add something that is much more tailored to Miden.

Miden packages contain the following contents, at a minimum:

MAST for code local to the package (i.e. it does not contain MAST for its dependencies)

An interface descriptor for the APIs exported by the package, i.e. what procedures are available, the MAST roots of those procedures, how they expect to be called, and some kind of type signature (TBD)

A dependency mapping, that specifies the digests of all dependencies referenced by the package, and where they were sourced from as a URI

Agreed.

When a Miden package is loaded by the VM, its transitive dependency tree is also loaded, to the extent allowed by the configuration, and the metadata associated with the package. Failing to load the entire dependency tree is not a hard error, instead, if during execution, a MAST root is referenced for which we have no metadata, or which we were unable to load, only then does it become a hard error.

Agreed. One open question here is whether this loading happens at build time or runtime. Doing this at build time may make the VM more difficult for the users to use, but doing it at runtime may take a bit more time to develop for us.

Rust crates which wrap some MASM code, must produce a Miden package for the MASM code first, and derive Rust bindings from that package. A crate that is generated from bindings, will by convention use the -sys suffix. Upstream crates which add Rust-native abstractions on top of the bindings, will depend on the Miden package via the corresponding -sys crate.

One thing is still not clear to me: in addition to Rust bindings, what else would the crate contain:

Just a reference (i.e., a URL) to the underlying Miden package, or
Embedded Miden package it was generated with.
Embedded Miden package + source MASM (or other) code (for debugging purposes?)

My original assumption was that it would be 2 (embedded Miden package) - but now I'm not sure.

A Rust crate which wraps a Miden package, must include any changes to the underlying Miden package version, in its semantic versioning scheme (i.e. if a change to the underlying Miden package includes a breaking change, the Rust crate version must reflect this via a major version bump).

Agreed.

Where possible, we will prefer to emit Rust bindings directly from a Miden package, without requiring a dependency on a Rust crate. This is the default mode of operation for the Wasm component model toolchain, so to the extent that we can generalize it to any Miden package, we should do so.

Agreed - thought, i think it will probably take some time to build an automated tool to generate Rust bindings automatically.

Miden packages produced from Rust code, will be distributed the same as if they were produced from MASM, i.e. it shouldn't be relevant what language a Miden package was compiled from. To the extent that additional libraries and such are needed on a per-language basis, those dependencies would be defined and distributed via the language dependency ecosystem, not the Miden package ecosystem.

Agreed.

hopefully it is clear that what I'm suggesting is that we try to lay the groundwork for what I've described above, by using the Miden standard library as a blueprint.

Yep - that's how I'm thinking about this as well. We also have a very concrete need to use Miden stdlib in the miden-base crates - so, this will be a very good use case on which to iron out the details.

We need to determine how to express Miden procedures in an interface descriptor language like WIT (doesn't have to be WIT, but it needs to be something that can be converted to/from higher-level languages)

I'm fine with using WIT (especially since you guys already are quite familiar with it) but one question: what to do about the procedures which cannot be expressed using WIT? And more generally, should each Miden package come with a sort of "MASM bindings" (i.e., bindings which could be used to use the package as a dependency when writing MASM manually)?

I'm assuming that we'll also need to modify the assembler so that it can use Miden packages as dependencies directly.

This is very relevant for the miden-base crates as miden-lib is written entirely in MASM and it needs to depend on miden-stdlib

We need to pin down the package format details

We need to set up a build for miden-stdlib to produce a Miden package file

We need to write the Rust bindings for a specific version of the Miden package

Agreed. Once we have the package format defined, I think the other two shouldn't be too difficult as miden-stdlib is relatively small (and for the initial version, we can reduce the set of "public procedures" even more).

To me personally, the two main outstanding questions are:

Design package format so that it can be used by the compiler and the assembler (so that we could also use it in miden-base).
Figure out how the initial implementation of the ObjectStore would look like (so that we could import miden-stdlib in miden-base and everywhere downstream).
- One potentially tricky thing to address here is that ObjectStore should work in WASM context as well (because we need to be able to run the VM in WASM). So, we either need to have an in-memory object store or have a separate implementation of the ObjectStore which fetches data from JS environment.

greenhat commented 4 weeks ago

@bobbinth Following our call, could you please create the following crates for SDK:

miden-stdlib-sys
miden-tx-kernel-sys
miden-sdk

bobbinth commented 4 weeks ago

@bobbinth Following our call, could you please create the following crates for SDK:
miden-stdlib-sys
miden-tx-kernel-sys
miden-sdk

These are now on crates.io.

greenhat commented 2 weeks ago

@bobbinth Please give me permissions on this repo to set secrets. So I could set the token for publishing to crates.io that will be used by the release action I made in #184.

bobbinth commented 2 weeks ago

@greenhat - do you know what kind of permission this is? (i.e., where do i need to go to grant it)

greenhat commented 2 weeks ago

@greenhat - do you know what kind of permission this is? (i.e., where do i need to go to grant it)

It seems that only the admin role can set the secrets - https://docs.github.com/en/actions/security-guides/using-secrets-in-github-actions#creating-secrets-for-a-repository Alternatively, I can send secrets to the current admin.

bobbinth commented 2 weeks ago

I added a new action secret to the repo. The name of the secret is CARGO_REGISTRY_TOKEN and the value is set to crates.io token which has publish_update permission for these crates. The token expires in 1 year.

0xPolygonMiden / compiler

Setup crates publishing #142

Goal

Implementation