isle: Managing dependencies between crates

As part of https://github.com/bytecodealliance/rfcs/pull/28 I'm working on factoring out ISLE definitions from cranelift-codegen and moving them to cranelift-asm; concretely, the enum representing each ISA's machine instruction.

Currently each inst.isle, contains other helpers that use the enum definition, which semantically belong in cranelift-codegen and that ideally shouldn't be moved into cranelift-asm. This poses an interesting question regarding having ISLE dependencies between crates and what's the best way to go about it.

I had a chat with @cfallin, in which we discussed a couple of possibilities:

Declaring each MInst as extern

This would require (i) changing the MInst definition in inst.isle in cranelift-codegen to be extern (ii) moving each MInst definition to cranelift-asm and (iii) making the Rust types from the generated code in cranelift-asm available to the environment of the isle-generated code in cranelift-codegen. Similar to what exists for all the the other extern enums. In terms of disadvantages/concerns: whenever there's a change to the MInst enum in cranelift-asm the extern definition in cranelift-codegen needs to be updated too, which is kind of expected when seeing this from the angle of a dependency upgrade (i.e. sometimes changes are required in the consumer side), but in the end, we are keeping a verbatim copy of the definition and I wonder is this is going to introduce too much complexity when working with these definitions (there might be other disadvantages that I'm not clearly seeing too!).

Having a single definition, serving both (or multiple) crates

This idea is half-baked. But in general it would require moving the common ISLE pieces to a central location and copying them at build time to perform code generation. This approach has the potential to solve the definition duplication issue, but the question of how to concretely express dependencies between ISLE definitions remains open: for example, the structure of this new central location could be something like:

- inst.isle 
- inst_enum.isle // MInst definition
- lower.isle

Ideally in this case, we'd be able to express dependency between inst.isle and inst_enum.isle in the language itself; but if there's a fundamental reason why we can't do it, this approach will require at the very least a "merge" of some sort between ISLE files, this makes this process less transparent. As a side note: is there a reason why ISLE couldn't be augmented to support a simple form of import directives?

There are a bunch of pros/cons to both items above, and my objective with this issue is not to decide on which one is better, but instead, make this conversation public to see what others think about this and what could be a sensible path forward.

In terms of how this affects the development of Winch; I don't think it affects it much. I would've preferred to start with cranelift-asm, but I see value in reducing complexity up-front, and delaying any refactoring until there's a clear picture of how all the pieces interact together. If there isn't a clear path forward, I'd prefer to keep the winch/cranelift-codegen dependency temporarily and avoid the cranelift-asm refactoring until we have an agreement on what an acceptable solution is here.

https://github.com/bytecodealliance/wasmtime/issues/4125 was the issue about generating machine instruction definitions that we talked about in the meeting today.

For the purposes specifically of unblocking a separate cranelift-asm crate, I think we could skip all the bits about collecting operands and basically everything other than what it takes to define the enum itself. Everything else can be added later.

I also think that yaml or toml would be nice for defining these. I specifically want to keep them declarative, and not programattically generated from "recipes" (like CLIF is in cranelift-meta) because it gets really hard to find the one place where an instruction is defined and where all instructions that exist are.

Straw person:

instructions:
  nop:
    len: u8
  alu_rmi_r:
    size: OperandSize
    op: AluRmiROpcode
    src1: Gpr
    src2: GprMemImm
    dst: WritableGpr
  # etc...

We compile this to ISLE with and without an extern in build.rs depending which crate we are building.

We can make the data format richer as we add features (collecting operands automatically, etc)

I second the table-driven approach above; we want to do it eventually anyway and it solves this problem, so it seems to be the cheapest and simplest way forward.

One constraint not mentioned above that's worth naming explicitly here is that, AFAIK, a crate cannot have a dependence on another crate's source tree. So directly using either a yaml/toml or an ISLE file from cranelift-codegen in cranelift-asm, or vice-versa, is not possible. I think probably the cleanest way forward is to put the description in cranelift-asm along with the generator, and give it a mode to produce an "extern variant" of the enum definition (this is kind of like a header file for a library). Then we can have a scripts/update-inst-defs.sh or whatever that generates checked-into-git ISLE in both cranelift-asm and cranelift-codegen. Thoughts on that?

One of the interesting questions that came out of exploring this approach is: what's the cleanest way to handle the dependencies needed for a successful generation of ISLE types (concretely in this case the MachInst enum). For example it's generally common for MachInst definitions to depend on types defined through ISLE's prelude, like bool, u8; or to depend on other types that are guaranteed to be available in their environment like Gpr.

According to how the ISLE compiler works today, which requires an inline declaration of some sort of these external types, achieving a fully independent generation process, will require making the generation process aware of such dependencies. I can think of at least two ways to achieve this:

Requiring the configuration definitions to provide entries for all of their dependencies.
Deriving the dependencies at generation time.

Although, both avenues are feasible, they introduce questionable level of indirection, which is probably not desirable long term.

Option 1: Introduces more duplication by requiring the definition of the universe of extern types that a particular generation depends on.

Option 2: This is only partially possible, and mostly applicable when generating enums, which don't require the definition extern enum arms. Down the road this becomes problematic when synchronizing the extern generation of the MachInst enum in cranelift-codegen.

After chatting with @cfallin, we concluded that the cleanest approach would be to have a crate-level solution for artifact dependencies; similar to what is proposed here https://github.com/rust-lang/cargo/issues/9096. Since this proposal is not ready yet, we concluded that the next cleanest approach might be to go with a feature driven assembler in cranelift-codegen, in which particular features control which compiler capabilities are exposed depending on the consumer: the optimizing pieces for cranelift-wasm and the baseline pieces for winch.

bytecodealliance / wasmtime

isle: Managing dependencies between crates #5178