What do we ideally want the assembly (LLVM .s) text format to look like?

WebAssembly / tool-conventions

Conventions supporting interoperatibility between tools working with WebAssembly.

Artistic License 2.0

298 stars 65 forks source link

What do we ideally want the assembly (LLVM .s) text format to look like? #49

Open aardappel opened 6 years ago

aardappel commented 6 years ago

We just landed a first version of the assembly parser in LLVM https://reviews.llvm.org/D44329. This version is quite basic and adheres closely to what the disassembler had been outputting so far.

Now that we have both the assembler and dissassembler though, we can decide to make changes to this format that better suit future needs.

In particular, we currently have 2 flavors of the .s format:

Elf (-triple=wasm32-unknown-unknown-elf).
- This is the format that is currently consumed by s2wasm, so any changes to the format would also have to be made there. Since in the long run the toolchain will be all-binary by default, at which point s2wasm will maybe not be needed anymore, it may not be wise to invest too much in changing this format.
- This path currently defaults to using -disable-wasm-explicit-locals, i.e. it may have $0 as an operand to refer to local 0, instead of a preceding getlocal 0 instruction.
Wasm (-triple=wasm32-unknown-unknown-wasm).
- This corresponds to the wasm-specific .o format that gets directly consumed by lld. This is the path we are working towards the toolchain taking. Once that path is default, the function of the .s format is mostly intended for:
  - Inline assembly.
  - Writing LLVM tests.
  - Viewing .o file contents.
  - Writing .o files by hand? :)
- This format is intended to be used with explicit locals, as this matches the .wat format more closely.
- This format currently has no-one depending on it (afaik), so we can change this more easily to suit the above needs by just changing the assembler and disassembler together.
- This triple doesn't currently work since it requires an implementation of a part of LLVM that we only have for ELF (since it is also used by other "CPUs"), which we may add next. The parser itself already deals with all variants above.

The current disassembler outputs pseudo stack registers to make wasm look more like a CPU/register machine, and maybe make it easier for humans to track the stack, for example:

i32.const $push0=, 1
i32.const $push1=, 2
i32.add $push2=, $pop0, $pop1

This however is also verbose, so for the use of inline assembly in particular, it may be nice to allow people to write pure stack code. Or even better, we can decide that for the wasm mode above, we require that these stack operands not be present.

The LLVM table-gen assembly matcher used in the assembler currently requires there to be operands, so even if we ignore or disallow these operands in the .s format, they will temporarily be generated on the fly. The current implementation can't do this yet, and instead relies on them to be present (and correctly numbered).

In general, I think we want the .s format be as close as possible to the .wat format, while still fitting the LLVM mold.

Anything else we should change about the .s format while we're at it? Am I missing something?

binji commented 6 years ago

In general, I think we want the .s format be as close as possible to the .wat format, while still fitting the LLVM mold.

Yep, this sounds right to me. I'd say if we can keep the instructions looking the same, it would be best. I'm OK with the other parts (segments, definitions, etc) using a more "standard" assembly format personally.

Do the current formats have anything special for branches (i.e. arbitrary brs), or does it just follow the wat structured label format?

aardappel commented 6 years ago

@binji current branches just have a block nesting index. Labels are currently in the disassembly output and also parsed but unused. Looks like for inline asm & test purposes allowing use of labels may be nice? But then we need to also check that they can only occur adjacent to blocks.

aardappel commented 6 years ago

Unlike what I wrote above, we didn't actually have a disassembler yet (binary -> MC). What we've always had is the instruction printer (MC -> text). The assembler is also (text -> MC), so both feed into MC in the same way.

There is now an initial disassembler: https://reviews.llvm.org/D45848. Also, both elf and wasm mode for the moment share the same parser extension (for directives, etc): https://reviews.llvm.org/D45386

The big question right now comes back to what I wrote above: do we want a pure "stack" based version of these tools? This looks to be a bit more complicated than I originally thought, since all these tools communicate over an MCInst (and MCOperand) which are generic instruction storage, and their precise definition for a particular target is "whatever is defined in table-gen .td files". I originally assumed that improving these tools was just a matter of changing our text parsing/output, but it now seems like we'd need stack based instruction definitions in .td. to make this happen. Now, for some instructions (call and call_indirect) there already exist non-register versions, but I am not sure if these could easily be added for all instructions, or worse, wether this would need a second set of .td files.

Since the use cases for the .s file format are minimal, I am not sure whether going thru the above is warranted. I'd love some opinion on the above. We could also revisit this later, and declare we're happy with pseudo-registers for the moment?

dschuff commented 6 years ago

OK, so from the "assembly language" perspective I agree that it would be nice to have the format used by LLVM be as close as practical to the official wat format. (I'll call the current elf/s2wasm format the "register-style" format and the hypothetical wat-like format the "explicit-locals" format). If we can get a workable explicit-locals format, we can have just one flavor of the format, which would simplify usage a lot. It would be really unfortunate if we needed a register-style assembly for assembly files (and inline asm) but then we'd probably still also want an explicit-locals objdump mode that looks like wat, WABT output, etc.

Regardless though, there will be some extensions needed to support assembler use cases; we've been imagining most of these as looking like traditional assembler directives, but I think it's still an open question whether they all can, or whether we need any "interesting" extensions to the language. One simple example is the use of labels rather than numbers in data references (e.g. i32.const $foo or i32.load offset=$foo). This is a pretty straightforward extension and maybe doesn't qualify as "interesting" but there might be other issues too. I'd guess that the nature of these potential issues would be similar for explicit-locals vs register-style formats (the current s2wasm format already does support symbol names in the relevant places similarly to native ELF assembly).

Having said that, LLVM's MachineInstr/MCInst IR really does have the idea of registers as a first-class construct, and I'd say we are unlikely to fundamentally change that, even if we do more explicitly model the stack late in the pipeline. The outs and ins in the tablegen definitions reflect that, and it would also be unfortunate if we had to have duplicate instruction definitions for MC. It's possible that we could hide something in WebAssemblyInstrFormats.td where WebAssemblyInst and I are defined, or have some kind of convention for MC register values for asm/disasm. There are probably multiple options to explore, and I think it's probably worth exploring them sooner rather than later. I don't know off the top of my head of any particular pressure to, say, implement a full asm/disasm with pseudo registers right away, that would suggest we should do that before investigating the alternative.

aardappel commented 6 years ago

@dschuff Yes, let's decide on a direction, since I feel most other work depends on it.

Option A is thus to add no-register / explicit local instructions to the tablegen defs. My understanding of tablegen is minimal to the point that I can't evaluate the consequences of this decision.

Option B is to keep tablegen register based, and to generate registers on the fly in both asm and disasm. This is a bit more than just having explicit locals, since it also needs to track a stack of temporaries. Since they both actually do almost the same thing (translate a sequence of tokens or bytes into MC, respectively) a stack/local tracker could be shared between them? As noted, the disasm has much less context for doing this than the asm, and would need additional plumbing to know about locals and function start/end.

Is there an option C?

aardappel commented 6 years ago

I guess an option C for output is to stick a text post-processing pass in the instruction printer that filters out registers. But that does not solve it for input.

dschuff commented 6 years ago

I think that's a pretty good summary, I can't think of an option C off the top of my head.

Thinking through the options a little more: A: The LLVM backend infrastructure is pretty heavily based on the assumption that the instruction definitions correspond everywhere (e.g. the correspondence between a MachineInst and its MCInstrDesc) and that assumption would be broken with option A. If we did option A, we'd presumably try to keep all of the register-based instruction forms all the way through the compile pipeline and then have a big conversion table or something in WebAssemblyMCInstLower where the conversion to MC happens. I think in that case we'd have to assign the "real" opcode numbers to the MC versions of the instructions to make asm and disasm work, and there wouldn't need to be any funny business with registers since get_local and set_local instructions just have immediates. It's not clear to me right now what treatment we'd need to have for asm and disasm of e.g. module- and function-level constructs like globals and locals but maybe those would be independent of instructions in this case. All of that (i.e. the answer to "what are the consequences of having duplicate instruction definitions") actually doesn't sound as bad as I had thought; the risk so far would be any unforseen links between the compiler backend (i.e. before MCInstLower) and MC. That just leaves the problem of how to minimize the ugliness of generating duplicate definitions. It seems plausible that some clever tablegen hacking of the base classes in WebAssemblyInstrFormats.td could minimize this, but tablegen can be a bit of a dark art.

With option B, I guess MCInsts would have to carry fake registers everywhere, and I think your characterization is accurate.

One of the unknown unknowns we haven't talked about is inline asm. That's an asm use case which hooks back into the MachineInstr world (and also where the context is more limited?), and I could imagine that getting ugly with either one of the options. We should probably get a bit better understanding of that too. We have something for the register form that IIRC (I haven't looked in a while) isn't too terrible. But since we prefer explicit-locals everywhere if we can get it, it's not obvious how that will work.

aardappel commented 6 years ago

Ah, so for A, you're saying everything at the MC level would always be no-registers. The big question is to how safely to define the tablegen definitions so they work for both use cases.

I'd love to hear from whoever wrote the current tablegen definitions how feasible this is.

Good news I suppose is that now that we have both asm + disasm and tests, we can more easily see where stuff breaks if we attempt this transition.

I was actually just digging into inline asm. I shall keep digging.

sunfishcode commented 6 years ago

Another possibility would be to do something like the x86 target does for the x87 stack. See here. CodeGen initially uses x87 instructions with plain virtual register inputs and outputs, and they're later converted into instructions that access the stack.

dschuff commented 6 years ago

Interesting, that looks to me basically like option A, except that the conversion happens earlier in the pass pipeline.

aardappel commented 6 years ago

This seems to be the way they are split up: Note that the FpI instruction should have instruction selection info (e.g. a pattern) and the FPI instruction should have emission info (e.g. opcode encoding and asm printing info).

@dschuff You're saying the conversions happens before it is converted to MC? What still happens between stackify and MC, and would that work for us as well?

aardappel commented 6 years ago

(deleted previous incorrect comment)

So for A) it seems that if we transition from registers to stack instructions at the boundary of translating from MachineInst to MCInst, we can keep MCInst be stack based in all occasions, and asm/disasm can be relatively simple and elegant (not every dealing with registers). We'd need to change/move the current register to stack conversion that happens later in the pipeline (after MC).

One remaining issue is supporting s2wasm, which currently wants registers. We can potentially allow MC to also still represent register versions purely for this path, but that would make the above potentially less elegant. Or we could modify s2wasm to not need registers, since it will discard them anyway. Or better yet, maybe by the time this land we don't need the s2wasm path at all anymore if lld is deemed ready. So we can decide this later.

To be able to pick A) over B), the remaining work is to see how hard it is to make tablegen produce 2 parallel instruction sets. I'll have a look at that next (and see how x87 does it).

rossberg commented 6 years ago

If you want to stay as close to the wat format as possible, would it work to use wat + annotations or would that be too unwieldy?

dschuff commented 6 years ago

We'd need to change/move the current register to stack conversion that happens later in the pipeline (after MC).

I'm a bit unclear what you mean here. Currently the ExplicitLocals runs almost at the end of the MI with the pre-emit passes but not after MC. The X86 stackifier pass runs earlier, but I think we could do whatever makes sense, as long as the passes after the conversion work on the stack version of the instructions rather than the register versions. It's possible that in the future we will make stackification span basic blocks, in which case the coupling with the various CFG-munging passes will be more important.

To be able to pick A) over B), the remaining work is to see how hard it is to make tablegen produce 2 parallel instruction sets. I'll have a look at that next (and see how x87 does it).

SGTM

dschuff commented 6 years ago

If you want to stay as close to the wat format as possible, would it work to use wat + annotations or would that be too unwieldy?

That's a good question. I think we are currently leaning toward trying the stack-like instructions, so if we do that then it might actually work. The MC machinery is quite linear and fairly stateless so one question is how annoying would it be to make sure all the right parens get printed at the right time, and the parser can handle them. With the linear (i.e. non-folded non-paren) wat form, mostly this would just have to happen at module headers and function boundaries, so it might not be too hard since there are target pre- and post-hooks for assembly modules and functions. All the inline assembler directives that go inside function bodies could probably just go wherever they go now, they would just be spelled differently. Then you'd have relocations in instructions. Since the annotations are additive (rather than replacing existing operands) you'd probably end up with constructs like i32.const 0 (@reloc "symbolname") or i32.load offset=0 (@reloc "symbolname" (addend 123)), which is more unwieldy than traditional assembler syntax but it might be worthwhile if it means that we could use "real" wat. It's probably worth trying out.

aardappel commented 6 years ago

@dschuff I guess I was unclear about the order these things happen. The weird thing is that even though explicit locals may be introduced earlier, it still keeps registers thruout MC, discarding them only at the end just before writing binary I presume. So MC with explicit locals is a weird hybrid model. If we'd move MC to not have registers, then whatever code drops them needs to be removed (because presumably not emitting them in MI->MC is automatic if the right table-gen instructions are used).