0xPolygonMiden / miden-vm

STARK-based virtual machine
MIT License
610 stars 148 forks source link

Crate organization #570

Open bobbinth opened 1 year ago

bobbinth commented 1 year ago

Some thoughts about potential crate organization options. First, the current organization roughly looks like this:

image

Most crates in the above are self-explanatory, but miden-core deserves a few extra words. Currently, miden-core contains the following components:

  1. Some common imports from Winterfell - e.g., Felt, FieldElement, StarkField etc.
  2. MAST-related structs - e.g., Program, CodeBlock, Operation etc.
  3. Hash function related components (e.g., Digest) - these are mostly re-exported from Winterfell.
  4. AdviceSet and structs - e.g., MerkleTree, MerklePathSet etc.
  5. Trace layout constants.
  6. Some commonly-used utility functions (some re-exported from Winterfell and others defined in the crate).

For v0.4, we already have plans to do the following:

Beyond this, I think we can also do the following:

Preliminary dependency graph accounting for these changes would look like this:

image

The above assumes that trace layout constants have been moved from miden-core to miden-air - though, we don't need to do this right away.

The two new crates (miden-objects and miden-lib) could live in the same repo under 0xPolygonMiden. Probably something like minde-base.

Also, the current repo will probably be renamed to miden-vm and moved to the same org.

vlopes11 commented 1 year ago

A couple of suggestions:

Single-import

We could set a rule to never import dependencies of dependencies directly.

image

In this example, C cannot require A as dependencies; instead, B should re-export A, and C should fetch A from B.

Benefits: We will never have a tree conflict of B requiring a different A than C.

Transitive simplification

The following graph suggests that B should exist either as module of A or C, unless there is a strong domain separation between them

image

Candidates for this check:

hackaugusto commented 1 year ago

I'm going to add a few other considerations to the discussion

  1. Number of distinct repos

Having a long chain of repos results into slower feature releases. The user-facing program is the end in the dependency chain, so the amount of work is proportional to the number of distinct repos used by the dependencies. IOW updating more repos means more PRs, reviews and CI runs. This problems gets worse as the owner of the libraries changes.

This is a bit unavoidable, as the domain can be so different that most people won't have the competency to work on the whole stack. But, to me that means we shouldn't have overly generic crates like miden-objects on a distinct repo, because the extra work to update it wouldn't be from a difference in domain knowledge, but just pure code organization.

  1. Number of distinct crates

I think having more crates is better, one of the big reasons is that crates allows for better compilation parallelism. (There is the caveat of code bloat, since monomorphization is done per crate, but IIRC duplicate code elimination happens at linking time, so this is mainly a disk usage kinda of thing).

But this is only an advantage if the dependencies are broad instead of deep, so I think it is better for miden-core to be a thin crate, or to not exist at all.

  1. Naming and discoverability

I think the crate name should clearly inform someone of its contents, without needing to read a README or the code itself. I also think names should be consistent, and that we shouldn't rename things, because it makes it easy to see a use directive and tell where the code is coming from. For this reason I would avoid crates like miden-object or miden-core, since that is not informative, and preferably not rename the crates in the Cargo.toml. I also think that this consistency should apply to repo names, so miden-crypto would IMO be a better name for it.

  1. Single imports

Just to document my perspective. I still haven't really understood the benefit of this approach. What I understood is that every time we add a new type on crate, it must be exported on every parent in the dependency chain. With the dependency graph above, if we add the SMT on miden-crypto, we have to update miden-core, miden-air, miden-prover to get it to miden?

The rationale of avoiding multiple doesn't make sense to me, since that is determined by the dependencies in the Cargo.toml and not source code re-exports.

The rationale of moving code around also doesn't seem super appealing, why would we want to future proof moving code crypto to air? I can understand moving a dependency down the dependency graph, and re-exporting it on the previous crate, but that would need re-exports as moves happen, and not the other way around.

This also makes code discoverability harder, because one would have to go over each re-export until finding the real implementation (for instance, while browsing code over GH, instead of local code with a language server)

hackaugusto commented 1 year ago

Another thing to consider with the single imports, this makes searching for code in the generate docs a bit harder, since the same structs are defined multiple times and at first glance there is no way to tell if it is just a re-export or another struct with the same name:

image