Open saulecabrera opened 2 years ago
https://github.com/bytecodealliance/wasmtime/issues/4125 was the issue about generating machine instruction definitions that we talked about in the meeting today.
For the purposes specifically of unblocking a separate cranelift-asm
crate, I think we could skip all the bits about collecting operands and basically everything other than what it takes to define the enum
itself. Everything else can be added later.
I also think that yaml or toml would be nice for defining these. I specifically want to keep them declarative, and not programattically generated from "recipes" (like CLIF is in cranelift-meta
) because it gets really hard to find the one place where an instruction is defined and where all instructions that exist are.
Straw person:
instructions:
nop:
len: u8
alu_rmi_r:
size: OperandSize
op: AluRmiROpcode
src1: Gpr
src2: GprMemImm
dst: WritableGpr
# etc...
We compile this to ISLE with and without an extern
in build.rs
depending which crate we are building.
We can make the data format richer as we add features (collecting operands automatically, etc)
I second the table-driven approach above; we want to do it eventually anyway and it solves this problem, so it seems to be the cheapest and simplest way forward.
One constraint not mentioned above that's worth naming explicitly here is that, AFAIK, a crate cannot have a dependence on another crate's source tree. So directly using either a yaml/toml or an ISLE file from cranelift-codegen
in cranelift-asm
, or vice-versa, is not possible. I think probably the cleanest way forward is to put the description in cranelift-asm
along with the generator, and give it a mode to produce an "extern variant" of the enum definition (this is kind of like a header file for a library). Then we can have a scripts/update-inst-defs.sh
or whatever that generates checked-into-git ISLE in both cranelift-asm
and cranelift-codegen
. Thoughts on that?
One of the interesting questions that came out of exploring this approach is: what's the cleanest way to handle the dependencies needed for a successful generation of ISLE types (concretely in this case the MachInst
enum). For example it's generally common for MachInst
definitions to depend on types defined through ISLE's prelude, like bool
, u8
; or to depend on other types that are guaranteed to be available in their environment like Gpr
.
According to how the ISLE compiler works today, which requires an inline declaration of some sort of these external types, achieving a fully independent generation process, will require making the generation process aware of such dependencies. I can think of at least two ways to achieve this:
Requiring the configuration definitions to provide entries for all of their dependencies.
Deriving the dependencies at generation time.
Although, both avenues are feasible, they introduce questionable level of indirection, which is probably not desirable long term.
Option 1: Introduces more duplication by requiring the definition of the universe of extern types that a particular generation depends on.
Option 2: This is only partially possible, and mostly applicable when generating enums, which don't require the definition extern enum arms. Down the road this becomes problematic when synchronizing the extern generation of the MachInst
enum in cranelift-codegen
.
After chatting with @cfallin, we concluded that the cleanest approach would be to have a crate-level solution for artifact dependencies; similar to what is proposed here https://github.com/rust-lang/cargo/issues/9096. Since this proposal is not ready yet, we concluded that the next cleanest approach might be to go with a feature driven assembler in cranelift-codegen
, in which particular features control which compiler capabilities are exposed depending on the consumer: the optimizing pieces for cranelift-wasm
and the baseline pieces for winch
.
As part of https://github.com/bytecodealliance/rfcs/pull/28 I'm working on factoring out ISLE definitions from
cranelift-codegen
and moving them tocranelift-asm
; concretely, the enum representing each ISA's machine instruction.Currently each
inst.isle
, contains other helpers that use the enum definition, which semantically belong incranelift-codegen
and that ideally shouldn't be moved intocranelift-asm
. This poses an interesting question regarding having ISLE dependencies between crates and what's the best way to go about it.I had a chat with @cfallin, in which we discussed a couple of possibilities:
Declaring each
MInst
asextern
This would require (i) changing the
MInst
definition ininst.isle
incranelift-codegen
to beextern
(ii) moving eachMInst
definition tocranelift-asm
and (iii) making the Rust types from the generated code incranelift-asm
available to the environment of the isle-generated code incranelift-codegen
. Similar to what exists for all the the otherextern
enums. In terms of disadvantages/concerns: whenever there's a change to theMInst
enum incranelift-asm
the extern definition incranelift-codegen
needs to be updated too, which is kind of expected when seeing this from the angle of a dependency upgrade (i.e. sometimes changes are required in the consumer side), but in the end, we are keeping a verbatim copy of the definition and I wonder is this is going to introduce too much complexity when working with these definitions (there might be other disadvantages that I'm not clearly seeing too!).Having a single definition, serving both (or multiple) crates
This idea is half-baked. But in general it would require moving the common ISLE pieces to a central location and copying them at build time to perform code generation. This approach has the potential to solve the definition duplication issue, but the question of how to concretely express dependencies between ISLE definitions remains open: for example, the structure of this new central location could be something like:
Ideally in this case, we'd be able to express dependency between
inst.isle
andinst_enum.isle
in the language itself; but if there's a fundamental reason why we can't do it, this approach will require at the very least a "merge" of some sort between ISLE files, this makes this process less transparent. As a side note: is there a reason why ISLE couldn't be augmented to support a simple form ofimport
directives?There are a bunch of pros/cons to both items above, and my objective with this issue is not to decide on which one is better, but instead, make this conversation public to see what others think about this and what could be a sensible path forward.
In terms of how this affects the development of Winch; I don't think it affects it much. I would've preferred to start with
cranelift-asm
, but I see value in reducing complexity up-front, and delaying any refactoring until there's a clear picture of how all the pieces interact together. If there isn't a clear path forward, I'd prefer to keep thewinch
/cranelift-codegen
dependency temporarily and avoid thecranelift-asm
refactoring until we have an agreement on what an acceptable solution is here.