0xPolygonMiden / miden-vm

STARK-based virtual machine
MIT License
628 stars 160 forks source link

Standard library organization #80

Closed bobbinth closed 2 years ago

bobbinth commented 2 years ago

Given that there are already things on the horizon which can go into standard library (e.g., #77), I think it might be a good time to think about how to organize standard library.

I'm inclined to keep things as simple as possible (at least at this point) - so, a few assumptions:

The way I'm currently thinking about is that the assembler will load the full standard library at instantiation time. For this, we'll need to build a procedure map (similar to procedure map for local procedures) in the assembler constructor. Then, when a script is being parsed, the assembler can look up a given procedure in this map and inline it as it does with local procedures.

A few preliminary questions to consider:

  1. Should standard library live in assembly crate or its own crate? I'm inclined to think that in its own crate (e.g., stdlib).
  2. What extensions should we use for assembly files? I'm thinking .ma (Miden assembly) - but open to other suggestions.
  3. Should we support module hierarchies? It seems like we should - but what's the simplest way to do it?
    • This is probably by far the most complicated part. If there are modules, can some modules import procedures from other modules? How do we handle module resolution? etc.
  4. Should we have something similar to use statements in Rust? Or should standard library procedures be always referred to by fully qualified names? e.g., exec.crypto:hashes:blake3 vs use std:crypto:hashes:blake3 and then in the code exec.blake3.
  5. How does the assembler load the standard library? Would it build one giant string of all the code in the library and parse it in one go? Or would it do so file by file? And if file by file, what determines the order?
itzmeanjan commented 2 years ago

Should standard library live in assembly crate or its own crate? I'm inclined to think that in its own crate (e.g., stdlib).

I agree with you that standard library should be placed somewhere, which explicitly defines the intention

What extensions should we use for assembly files? I'm thinking .ma (Miden assembly) - but open to other suggestions.

I was thinking about .masm which clearly shows it's assembly file, but I'm okay to live with .ma too :+1:

Should we have something similar to use statements in Rust? Or should standard library procedures be always referred to by fully qualified names? e.g., exec.crypto:hashes:blake3 vs use std:crypto:hashes:blake3 and then in the code exec.blake3.

I think we need something like this otherwise soon it'll be too much of cognitive load while just adding/ fixing these import paths. Also to handle name conflicts we can bring in some ideas from Rust, so that we can do name aliasing.

Would it build one giant string of all the code in the library and parse it in one go?

I think we should not go this way, instead we can do it in hierarchical fashion. Meaning say in some random Miden Assembly code

import std:crypto:hashes:blake3

proc.set_consts
...
end

begin
   exec.set_consts
   exec.blake3
   ...
end

when parser works on this input source string, it should note which standard library modules are being used and then it can only bring those portions ( say simply inlines it ) of standard library which are required for using exec.blake3. For that we have to define one hierarchical module structure which is known to parser and when parser prepares final source string for compilation, it should only have those portions ( say nodes of dependency graph ) of standard library which are required for compiling current assembly file.

I think this is decrease the amount of work that parser and its helper components need to do when source processing.

bobbinth commented 2 years ago

I think we need something like this otherwise soon it'll be too much of cognitive load while just adding/ fixing these import paths. Also to handle name conflicts we can bring in some ideas from Rust, so that we can do name aliasing.

Agreed!

I think we should not go this way, instead we can do it in hierarchical fashion.

Agreed here as well. We need to figure out what's the simplest way to support such a structure and how flexible we want to make it. For example, do we want to support something like this as well (in addition to your example):

import std:crypto:hashes:*

proc.set_consts
...
end

begin
   exec.set_consts
   exec.blake3
   ...
end
grjte commented 2 years ago

I agree with the approach that’s been discussed here so far. Just to throw in my 2 cents, I also prefer .masm as suggested by @itzmeanjan (although both are fine).

bobbinth commented 2 years ago

.masm for extension it is!

A few more thoughts from my side:

I'd prefer we use the same general format for import statements as we do for all other instructions: period-separated keywords. Also, my slight preference is that we use use rather than import but I'm fine with ether. So, an import statement would look like:

use.std:crypto:hashes:blake3

Also, I'm thinking that for simplicity of parsing, we require that all import statements are grouped at the top of a file. What do you think?

Next, I'm wondering if we should limit imports to modules (and not to procedures within modules). Basically, in the above example blake3 would be a separate .masm file. The semantics of this would be that the contents of a referenced module will be parsed in its entirety - and we won't need to do anything more sophisticated than that.

We could also allow importing multiple modules in the same statement (similar to how Rust does it). For example:

use.std:crypto:hashes:{blake3,sha256}

Though, the above might be a bit confusing given that we can't have spaces in the list of imports (at least with our current strategy).

Next, we probably need to have some kind of visibility modifiers. The simplest form is to have public and private procedures within a module (each module would have at least one public procedure). Not sure what would be the best way to label public procedures yet. Another alternative is to have a single public procedure per module. It's a bit limiting, but it might simplify a few things.

Lastly, I wondering if we should support "inline imports". Something like:

being
    exec.std:crypto:hashes:blake3
end

My initial thinking is that this adds too much complexity. Although, if we go with single public procedure per module, this should be fairly straightforward.

One other thing, should we allow aliasing? For example, something like this:

use.std:crypto:hashes:blake3.as.blake
grjte commented 2 years ago

My main thoughts here are:

  1. In general, I favor introducing complexity iteratively rather than initially, so I think we should start by supporting a minimal set of options that we will definitely want, while aiming not to corner ourselves too much.
  2. I like the idea of taking a Rust-like approach (but minimized). I think this puts us on a path that includes the things we've discussed that we might want, so we can choose to add them later.

Format

my slight preference is that we use use rather than import but I'm fine with ether.

I agree with use.

I'm thinking that for simplicity of parsing, we require that all import statements are grouped at the top of a file.

I've never felt much benefit from allowing imports farther down in the file, so I prefer to keep them at the top too. Similarly, I would skip supporting inline imports like the example below for an initial implementation of imports:

being
    exec.std:crypto:hashes:blake3
end

I think this is less ergonomic than use.std:crypto:hashes:blake3 and exec.blake3 and doesn't add enough value for the amount of complexity to justify be included in an initial implementation.

I'd prefer we use the same general format for import statements as we do for all other instructions: period-separated keywords. use.std:crypto:hashes:{blake3,sha256} use.std:crypto:hashes:blake3.as.blake

I think these both feel a bit strange. I know sticking to period-separated keywords makes sense for keeping the parsers simple and consistent, but I'd probably be inclined not to support these options initially and to revisit them both in a future iteration, at which point it might be worth considering using spaces in the import statements instead of period separators.

use.std:crypto:hashes:blake3

So to summarize - I think we should initially stick to having a series of simple use statements like this one at the top of the file. I would probably also use a double colon in the path use.std::crypto::hashes::blake3, but that's my personal preference to minimize semantic differences between languages & maximize familiarity, especially if we're taking a lead from Rust (or thinking about doing so in the future).

Visibility Modifiers

The simplest form is to have public and private procedures within a module (each module would have at least one public procedure)

I don't think we would need more than this, so I think going in this direction makes sense.

Another alternative is to have a single public procedure per module. It's a bit limiting, but it might simplify a few things.

This might be a good starting place. It'll be simpler for now, but if we do it right, we should be able to relax the restriction in the future fairly cleanly to get to public/private procedures as mentioned above if we want to.

bobbinth commented 2 years ago

Agree with pretty much everything above. To sum up:

Library modules will be individual file with .masm extension. Each module may export multiple procedures, and may have private (un-exported) procedures. For example, let's say we have a module for handling u256 arithmetic. It could be located in stadlib/math/u256.masm file and could look like so:

proc.some_helper
  <instructions>
end

export.add
  <instructions>
end

export.mul
  <instructions>
end

To use procedures from a module in a script we'd need to do the following:

use.std::math::u256

begin
  <instructions>
  u256::add
  <instructions>
end

For now, we don't allow module aliasing and combining multiple imports into a single line. We also won't allow circular module imports. Mean, if module A uses module B, module B cannot import module A.

One open questions: should we allow "root" modules?

For example, should it be possible to have a module at std::math. My initial thinking is: probably not. But I haven't thought through the limitations this would entail.

bobbinth commented 2 years ago

Closed by #121