golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.84k stars 17.51k forks source link

cmd/asm: refactor the framework of the arm64 assembler #44734

Open erifan opened 3 years ago

erifan commented 3 years ago

I propose to refactor the framework of the arm64 assembler.

Why ? 1, The current framework is a bit complicated, not easy to understand, maintain and extend. Especially the handling of constants and the design of optab. 2, Adding a new arm64 instruction is taking more and more effort. For some complex instructions with many formats, a lot of modifications are needed. For example, https://go-review.googlesource.com/c/go/+/273668, https://go-review.googlesource.com/c/go/+/233277, etc. 3, At the moment, we are still missing ~1000 assembly instructions, including NEON and SVE. The potential cost for adding those instructions are high.

People is paying more and more attention to arm64 platform, and there are more and more requests to add new instructions, see https://github.com/golang/go/issues/40725, https://github.com/golang/go/issues/42326, https://github.com/golang/go/issues/41092 etc. Arm64 also has many new features, such as SVE. We hope to construct a better framework to solve the above problems and make future work easier.

Goals for the changes 1, More readable and easy to maintain. 2, Easy to add new instructions. 3, Friendly to testing, Can be cross checked with GNU tools. 4, Share instruction definition with disassembly to avoid mismatch between assembler and disassembler.

How to refactor ? First let's take a look of the current framework. image

We mainly focus on the span7function which encodes a function's Prog list. The main idea of this function is that for a specific Prog, first find its corresponding item in optab (by oplookfunction), and then encode it according to optab._type (in asmout function). In oplook, we need to handle the matching relationship between a Progand an optabitem, which is quite complex especially those constant types and constant offset types. In optab, each format of an instruction has an entry. In theory, we need to write an encoding function for each entry, fortunately we can reuse some similar cases. However sometimes we don't know whether there is a similar implementation, and as instructions increase, the code becomes more and more complex and difficult to maintain.

We propose to change this logic to: separate the preprocessing and encoding of a Prog. The specific method is to first unfold the Prog into a Progcorresponding to only one machine instruction (by hardcode), and then encode it according to the known bits and argument types. Namely: encoding_of_P = Known_bits | arg1 | arg2 | ... | argn

The control flow of span7 becomes: image

We basically have a complete argument type description list and arm64 instruction table, see argument types and instruction table When we know the known bits and parameter types of an instruction, it is easy to encode it.

With this change, we don't need to handle the matching relationship between the Progand the item of optabany more, and we won't encode a specific instruction but the instruction argument type. The number of the instruction argument type is much less than the instruction number, so theoretically the reusability will increase and complexity will decrease. In the future to add new instructions we only need to do this: 1, Set an index to the goal instruction in the arm64 instruction table. 2, Unfold a Progto one or multiple arm64 instructions. 3, Encode the parameters of the arm64 instruction if they have not been implemented.

I have a prototype patch for this proposal, including the complete framework and support for a few instructions such as ADD, SUB, MOV, LDR and STR. See https://go-review.googlesource.com/c/go/+/297776. The patch is incomplete, more work is required to make it work. If the proposal is accepted, we are committed to taking charge of the work.

TODO list: 1, Enable more instructions. 2, Add more tests. 3, Fix the assembly printing issue. 4, Cross check with GNU tools.

CC @cherrymui @randall77 @ianlancetaylor

randall77 commented 3 years ago

Keep in mind that the "Why?" section applies to pretty much all of the architectures. To the extent that we can design a system that works across architectures (sharing code, or even just sharing high-level design & testing), that would be great.

See also CL 275454 for the testing piece.

See also my comment in CL 283533.

erifan commented 3 years ago

Keep in mind that the "Why?" section applies to pretty much all of the architectures. To the extent that we can design a system that works across architectures (sharing code, or even just sharing high-level design & testing), that would be great.

I agree, but designing a general framework requires knowledge of multiple architectures. If Google can do this design, it would be best. We are happy to follow and complete the arm64 part.

See also CL 275454 for the testing piece.

Yes, I know this patch, Junchen is my colleague. This patch takes the mutual test of assembler and disassembler one step forward, and we also hope to use the gnu toolchain to test the Go assembler. It would be great if we could merge the golang/arch repo into the golang/go repo, so that the disassembler and assembler could share a lot of code.

See also my comment in CL 283533. "But maybe there's a machine transformation that could automate reorganizing the tables?"

This table is best to be as simple as possible, preferably decoupled from the Progstruct, and only contains the architecture-related instruction set. Refer to Openjdk and GCC, if we divide the assembly process into two stages, macro assembly and assembly. The assembly is only responsible for encoding an architecture-related instruction, and the macro assembly is responsible for the preprocessing of the Prog before encoding. Then we only need to use this table for encoding and decoding.

I hope to create a new branch to complete the refactoring of the arm64 assembler if this is acceptable. Thanks for your comment.

erifan commented 3 years ago

Hi, are there any further comments or suggestions?

If you think the above scheme is feasible, I am willing to extend it to all architectures. I can complete this architecture-independent framework and the arm64 part, and other architectures need to be completed by others. Of course, we will retain the old code before completing the porting of an architecture and set a switch for the new and old code paths.

erifan commented 3 years ago

We have tried to add initial SVE support to the assembler under the existing framework, see CL 153358, but in order to get rid of the many existing cases, we had to add an asmoutsvefunction, it looks just like a copy of asmout. But it is foreseeable that with the increase of SVE instructions in the future, the same problem will appear again. Go may not currently support SVE instructions, but Go will always support new architecture instructions, maybe AVX512 instructions, or other new architecture instructions. But as long as this problem still exists, it will only become more and more difficult to deal with as time goes on. To be honest, we hope that upstream can give us a clear reply, so that we will feel easier to make the plan for next quarter. Can I apply for Upstream to discuss this issue at the next compiler & runtime meeting? Thank you.

cherrymui commented 3 years ago

Thanks for the proposal.

The current framework is a bit complicated, not easy to understand, maintain and extend.

I think this is subjective. That said, I agree that the current optab approach is probably not the best fit for ARM64. I'm not the original author of the ARM64 assembler, but I'm pretty sure the design comes from the assembler of other RISC architectures, which comes from the Plan 9 assemblers. In my perspective the original design is actually quite simple, if the encoding of the machine instruction is simple. Think of MIPS or RISC-V, there are just a small number of encodings (that's where oprrr, opirr, etc. come from); offsets are pretty much fixed width and encoded at the same bits. However, this is not the case of ARM64. Being RISC it started simple but more and more things get added. Offsets/immediates have so many ranges/alignment requirements that differs from instruction to instruction, not to mention the BITCON encoding. Vector instruction is much worse -- I would say there is no regularity (sorry about being subjective). That said, the encoding is quite bit-efficient for a fixed-width instruction set.

Given that the ARM64 instruction encoding is not like what the optab approach is best suited, I think it is reasonable to come up a different approach. I'm willing to give this a try.

unfold

This sounds reasonable to me. It might simplify a bunch of things. It may also make it possible to machine-generate the encoding table that converts individual machine instruction to bytes.

But keep in mind that there are things handled in the Prog level, e.g. unsafe point and restartable sequence marking. Doing it at Prog level is simple because some Prog is restartable while its underlying individual instructions are not. I wonder how this is handled.

======

cc @laboger @pmur I think the PPC64 assembler is also undergoing a refactor, and there may be similar issues. It would be nice if ARM64 and PPC64 contributors could work together on a common/similar design, so it is easier to maintain cross-architecture-wise.

Currently, in my perspective there are essentially 4 kinds of assembler backends in Go: x86, ARM32/ARM64/MIPS/PPC64/S390X which are somewhat similar, RISC-V which is a bit different, and WebAssembly. As someone maintaining or co-maintaining those backends, I'd really appreciate if we can keep this number small. Thanks.

pmur commented 3 years ago

I have not had much time to digest this yet. Speaking for my PPC64 work, I have no existing plans to rewrite our assembler. Adding ISA 3.1 support (particularly for 64b instructions) requires some extra work, but not so much to make it look much different than it does today. A couple thoughts below on the current PPC64 asm deficiencies:

A framework to decompose go opcodes which don't map 1-1 with a machine instruction would likely simplify laying out code. Similarly, we need to fixup large jumps from conditional operations and respect alignment restrictions with 64b ISA 3.1 instructions.

Likewise, something to more reliably synchronize the supported assembler opcodes against pp64.csv (residing with the disassembler) would be desirable to avoid the churn of adding new instruction support as needed.

erifan commented 3 years ago

But keep in mind that there are things handled in the Prog level, e.g. unsafe point and restartable sequence marking. Doing it at Prog level is simple because some Prog is restartable while its underlying individual instructions are not. I wonder how this is handled.

This is easy to fix, we just need to mark the first instruction of the sequence as restartable. For example: op $movcon, [R], R -> mov $movcon, REGTMP + op REGTMP, [R], R We'll mark the first instruction mov $movcon, REGTMP as restartable (although it uses REGTMP) so this instruction can be preempted, and leave the second instruction op REGTMP, [R], R as non-restartable because it uses REGTMP so that it won't be preempted. Anyway we can fix this problem by making a mark when unfolding.

Another problem is the printing of assembly instructions. Currently we print Go syntax assembly listing, namely unfolded "Prog"s, after unfolding, we can no longer print the previous Prog form. And we can't print before unfolding, because at that time we haven't got the Pc value of each Prog. Of course, if we feel that the new print format is also reasonable, then this is not a problem.

erifan commented 3 years ago

The situation of different architectures seems to be different, but arm64 has many new features and instructions that are moving from design to real device, so I can see how much work is needed to add and maintain new instructions. so can we just do arm64 first? In the future, other architectures can decide whether to adopt the arm64 solution according to their own situation. I believe that even if Google designs a general assembly framework later, our work will not be wasted.

laboger commented 3 years ago

I think the PPC64 assembler is also undergoing a refactor, and there may be similar issues. It would be nice if ARM64 and PPC64 contributors could work together on a common/similar design, so it is easier to maintain cross-architecture-wise.

We were not the original authors of the PPC64 assembler either and there is a lot of opportunity for clean up. The refactoring work @pmur is doing now is a lot of simplification of existing code, in addition to making changes to prepare for instructions in the new ISA. IMO his current work won't necessarily be a wasted effort if we want to use a new design at some point because he is eliminating a lot of clutter.

I'm not an ARM64 expert but my understanding is that we are similar enough that their design should be compatible for us. I'm not sure about all the other architectures you list above -- depends on whether the goal is to have a common one for all or just start with ARM64 and PPC64.

It sounds like ARM64 wants to move ahead soon, probably this release, and I'm not sure that is feasible for all other architectures. But if others later adopt the same or similar design that should at least simplify the support effort Cherry mentions above.

I am totally in favor of sharing code as much as possible.

cc @billotosyr @ruixin-bao

cherrymui commented 3 years ago

Yeah, I think we can start with ARM64.

If there is anything that PPC64 or other architecture needs, it is probably a good time to bring it up now. Then we can incorporate that and come up a design that is common/similar across architectures.

I think ARM64 and PPC64 are the more relatively complex ones. If a new design works for them, it probably will work well for MIPS and perhaps ARM32 as well.

erifan commented 3 years ago

Okay, then I will start to complete the above prototype patch. Regarding cross-architecture and sharing code, I totally agree with this idea and will try my best to do it. But because the implementation of assembler is closely related to the architecture, to be honest, I don’t think we can share a lot of code, but the implementation idea is sharable. Just like the current framework, many architectures use the optab design.

Let me repeat our refactoring ideas:

... -> Unfold -> Fixup -> Encoding ->...

What we do in the Unfold function:

  1. If one Progcorresponds to multiple machine instructions, expand it into multiple Progs, and finally each Progcorresponds to one machine instruction.
  2. Set the relocation type if necessary.
  3. Make some marks if necessary, such as literal pool, unsafe pointer, restartable.
  4. Hardcode each Progto the right machine instruction.

What we do in the Fixup function:

  1. Branch fixup.
  2. Literal pool.
  3. Instruction alignment.
  4. Set the Pcvalue of each Prog.
  5. Any other processing before encoding. Originally, steps 1, 2, and 3 are separate stages, but I am not sure if all architectures require these processes, so I put them all in a large function Fixup, which actually deals with anything before encoding.

What we do in the Encoding function: The Encoding function only calculates the binary encoding of each Prog. The idea for arm64 is: known_bits | args_encoding. An architecture instruction information table will be used here, which contains the known bits and parameter types of each instruction, so encoding an instruction will be converted to encoding the instruction parameters. Maybe the instructions of each architecture are different, but I think it shouldn't be difficult to encode a machine instruction for each architecture.

I'm going do it according to this idea, and I also welcome any suggestions on this design and code at any time.

laboger commented 3 years ago

I'm looking at the design and code and I have a question related to the unfoldTab and using it to include the function to be called for processing. I believe that means as each prog is processed, the function being called to process it will be called through a function pointer found in the table, and I don't think a function called this way can be inlined. Won't this cause a lot more overhead at compile/assembly time, since currently the handling of each opcode is done through a huge switch statement?

erifan commented 3 years ago

Won't this cause a lot more overhead at compile/assembly time, since currently the handling of each opcode is done through a huge switch statement?

Yes. originally I guess table lookup maybe better than switch-case, I didn't consider inlining. I'll check which one is better, thanks.

erifan commented 3 years ago

Hi, update my progress and the problems encountered, and hope to get your help.

I only changed the arm64/linux assembler at present, and the plan is to do the arm64 first and then consider the cross-architecture part. The code is basically completed, all tests of all.bash have pass except for the ARM64EndToEnd test.

Repeat the implementation ideas before talking about the problems encountered:

  1. Split all Progsinto small Progsin a function called unfold, so that each Progcorresponds to only one machine instruction, and calculate the corresponding entry of each Progin the optabinstruction table in this function.
  2. Deal with literal pool.
  3. Processing branch fixup.
  4. Encode each Prog.

The two problems encountered are:

  1. Since we split the large Proginto small Prog, the assembly instructions printed by the -S option of the assembler and compiler are different from the previous ones. For example, MOVD $0x123456, R1 The previous print result is: MOVD $0x123456, R1 The print result now is: MOVZ $13398, R1 + MOVK $18, R1 Another situation is that Prog has not been split, but its parameters for encoding were changed, such as SYSL $0, R2 Previous print result: SYSL ZR, R2 The print result now: SYSL ZR, $0, $0, $0, R2 In addition, since the codegen test depends on the printing result, the change of the printing format will cause the corresponding change of the codegen test.

  2. As mentioned earlier, the ARM64EndToEnd test failed, which is also caused by the splitting of large Proginto small Prog. For example, SUB $0xaaaaaa, R2, R3 The expected encoding result is 43a82ad163a86ad1, which is a combination of two instruction encodings. But now the large Progcorresponding to this instruction was split into two small Progs, and our test is to check the coding results of the large Prog. What we actually get is the encoding 43a82ad1 of the first small Prog, so the test fails.

These problems are all caused by the original Prog being split or modified. The test failures can be fixed by some methods, what I want to ask is whether the change in the assembly printing format is acceptable? If it is unacceptable, I will find a way to keep the original Progwhen unfolding, and save the small Progsin a certain field of the original Progfor later encoding. This may use a little more memory, but I guess the impact should be small. Hope to get your comments. /CC @cherrymui @randall77 Thanks.

cherrymui commented 3 years ago

I think it is okay to change the printing or adjusting tests, if the instructions have the same semantics.

Rewriting a Prog to other Prog (or Progs) is more of a concern for me. Things like rewriting SUB $const to ADD $-const are okay, which is pretty local. Rewriting one Prog to multiple Progs, especially with REGTMP live across Progs, is more concerning. This may need to be reviewed case by case. Or use a different data structure.

erifan commented 3 years ago

I think it is okay to change the printing or adjusting tests, if the instructions have the same semantics.

I will prepare two versions, one with Prog split, and the other without Prog split.

Rewriting a Prog to other Prog (or Progs) is more of a concern for me. Things like rewriting SUB $const to ADD $-const are okay, which is pretty local. Rewriting one Prog to multiple Progs, especially with REGTMP live across Progs, is more concerning. This may need to be reviewed case by case. Or use a different data structure.

Yes, this is also the most troublesome part. According to what I have observed so far, splitting a Prog into multiple Progs only affects whether an instructions is isRestartable. Because if a Prog does not use REGTMP, the split has no effect on it. If a Prog uses REGTMP, the small Prog after splitting becomes an unsafe point due to the use of REGTMP and becomes non-preemptible. It was restartable before, but it is not anymore. But this does not affect the correctness, and the performance impact should be relatively small.

gopherbot commented 3 years ago

Change https://golang.org/cl/347490 mentions this issue: cmd/internal/obj/arm64: refactor the assembler for arm64 v1

gopherbot commented 3 years ago

Change https://golang.org/cl/297776 mentions this issue: cmd/internal/obj/arm64: refactor the assembler for arm64 v2

gopherbot commented 3 years ago

Change https://golang.org/cl/347531 mentions this issue: cmd/internal/obj/arm64: refactor the assembler for arm64 v3

erifan commented 3 years ago

Hi, I uploaded three implementation versions, of which the second and third versions are based on the first, because I'm not sure which one is better. Their implementation ideas are basically the same, the difference is whether to modify the original Prog when unfolding, and whether to use a new data structure.

  1. The first version will basically not change the original Prog when unfolding. If a Prog corresponds to multiple machine instructions, it will create new Progs and save the unfolded result in the Rel field of the original Prog. Example, unfold p1, q1~q3 are also Progs.
             ...  -> p0->p1->p2->p3...
                               |
                               | Rel
                              \|/
                               q1-> q2->q3
  2. The second version will expand the Prog in place, so it will modify the original Prog. It will change the printing form of some assembly instructions. Such as MOVD may be printed as MOVZ + MOVK
            ...  -> p0->p1->p2->p3...        will be unfolded as
            ...  -> p0->q1->q2->q3->p2->p3...
  3. The third version will not modify the original Prog, it newly defines a new data structure Inst to represent machine instructions. The advantage of this is that it is convenient to convert Go assembly instructions into GNU instructions, and mutual test with GNU.
             ...  -> p0->p1->p2->p3...
                               |
                               | Inst
                              \|/
                               inst1-> inst2->inst3

    Generally speaking, the instructions generated by the three versions are the same, so there is no difference in speed. However, there will be a little difference in memory usage. The second version causes the smallest memory allocation regression, the first version is the second, and the third version is the worst.

https://go-review.googlesource.com/c/go/+/347532 is an example of adding instructions, based on the first version. This example is relatively simple, I will add a more representative example later.

Finally we'll only select one of them, so could you help review these patches help me figure out which one is what we want, or better idea? I know these patches are quite large and hard to review, but I don't find a good way to split them into smaller ones, because then bootstrapping will fail. Thanks.

erifan commented 2 years ago

Based on some early feedback from @cherrymui , we now only keep one version, which is basically version 3 mentioned above, but with a little difference. Due to the large amount of change, it is not easy to review, so the code has been reviewed for several release cycles. I wonder if it's possible to merge this into the master branch in the early stage when the 1.20 tree is open? That way, if there are issues, we'll have plenty of time to deal with them before the 1.20 release. In fact, since this implementation only affects arm64, and the logic is relatively simple, I am very confident in maintaining the code.

erifan commented 2 years ago

I've rebased the code many times since the first push, and the implementation compiles successfully and passes all tests, which also proves that my implementation has no correctness issues. So what are our concerns? The design or code quality or just no time to review?

gopherbot commented 2 years ago

Change https://go.dev/cl/424137 mentions this issue: cmd/internal/obj/arm64: new path for adding new instructions

mmcloughlin commented 1 year ago

@erifan Thank you for all your contributions. Excited to see your work land in an upcoming release!

Please forgive me if I'm joining this discussion too late or raising topics that have already been covered.

I'm the author of avo, a high-level assembly generator for Go that aims to make assembly code easier to write and review. Thus far, it's been amd64 only, but as I'm sure you understand, I'm getting a lot of interest in arm64 support https://github.com/mmcloughlin/avo/issues/189.

One of the major inputs to this work is the instruction database. I have spent a lot of time looking at the ARM Machine Readable Specification, but I've found it to be frustrating in the sense that it's both extraordinarily detailed but also fails to provide certain useful data, at least without substantial effort to derive it. I've got the impression from Alastair Reid's posts (such as Bidirectional ARM Assembly Syntax Specifications), as well as Wei Xiao's arm64 disassembler CL, that ARM may have internal data sources that are more convenient to use. Beyond this, there are also various ways in which the Go assembler syntax differs from the standard specification, which have to be captured somehow.

Therefore, I would like to ask if it might be possible that alongside this work to extend and improve arm64 support in the Go assembler, that you might also be able to provide a structured representation of the instruction set? The goal would be a JSON file or any structured file format that other tooling developers could use. This would be immensely valuable as an input to code generation in avo.

If this sounds like something you might be open to, I'd be happy to get into the nitty-gritty details.

Thanks again for all your work!

erifan commented 1 year ago

Hi @mmcloughlin , thanks for your support of arm64. There are indeed many arm64 instructions missing in the Go backend, especially the vector instruction. This is why this issue exists. As for a formatted instruction set, we don't have one either. Here are two files that we obtained by parsing the pdf document when we were doing the arm64 disassembler. Hope it will be helpful to you. https://github.com/golang/arch/blob/master/arm64/arm64asm/inst.json , https://github.com/golang/arch/blob/1bb480fc256aacee6555e668dedebd1f8225c946/arm64/arm64asm/tables.go#L949

mmcloughlin commented 1 year ago

@erifan Thanks for your quick reply.

I should say I'm not referring to missing instructions. All of the instructions are in the XML files distributed by ARM. My concern is about getting the details required about the instructions and their operands, and some of the details are not easy to derive.

For comparison, avo x86 instruction database is derived from a combination of Maratyszcza/Opcodes and the go/arch x86.v0.2.csv. These are combined together with various fixes and custom logic into avo's own database: see types and data. I will need essentially the same kind of data for arm64 support. For each instruction form we'd need to know:

I have already seen the files you referenced, but I don't think they provide the above information? Specifically, although the operands might be there, they're not in a form that can be easily used. I don't think implicit operands are represented there, or the operand actions?

You also said you derived these from a PDF, but from the code review comments I thought there was an internal tool that could not be open-sourced? Here @cherrymui asks "Is it possible to check in the generator somewhere?" and Wei Xiao replies:

The generator is a tool written with C++ for reading ARM processor description files to generate back-end utility functions for various languages and JIT projects. But it's only for internal use and not open-sourced so far.

See: https://go-review.googlesource.com/c/arch/+/43651/3..11/arm64/arm64asm/condition.go#b83

In another comment, Wei Xiao acknowledges that the JSON representation does not contain enough information about operands and that he had to use the internal tool instead:

But we soon find that there are a lot of manual work since JSON file doesn't classify operands of each instruction, which is actually the major work for generating the table. Classifying operands (i.e. decoder arguments) manually is also error-prone and unsystematic since there are +1k instructions. So we switch to our internal tool which has already classified all instruction operands and just reuse the JSON for test case generation.

See: https://go-review.googlesource.com/c/arch/+/43651/comment/ac7615dd_07db74f9/

So the question is whether you can provide a database in go/arch which would be like the inst.json you shared but with more complete information about the instruction operands and actions, which would be enough to provide the same kind of data that avo uses for x86?

Thanks again!

erifan commented 1 year ago

So the question is whether you can provide a database in go/arch which would be like the inst.json you shared but with more complete information about the instruction operands and actions, which would be enough to provide the same kind of data that avo uses for x86?

Sorry @mmcloughlin I don't have such a document. There are some xml documents inside ARM, but the format is not exactly in line with our needs. It needs some processing before it can be used. I can't share it with you.

FiloSottile commented 1 year ago

Hi all, thank you for working on this. Just wanted to mention that Avo support is pretty critical for us on the cryptography side. We've found Avo generators much easier to produce and review than raw assembly, and I'd be much more likely to review new Avo than new assembly for arm64 (and amd64).

cherrymui commented 1 year ago

Thanks all, especially @mmcloughlin, for the interesting discussion about avo and machine readable instruction data!

As @erifan mentioned that currently there isn't such data, I wonder if a better option would be creating such data (even if it is hand-written), which could be beneficial for both the Go assembler (for this issue) and tools (including e.g. avo, and Go's disassembler)?

The current CL writes operand processing and instruction encoding functions by hand. What about write the same information as a table instead? I'd think they are roughly the same amount of effort? For the assembler, we could then use the table, either generating code from the table, or using some table-driven technique to process the instructions.

Thanks.

erifan commented 1 year ago

I wonder if a better option would be creating such data (even if it is hand-written),

I think our current practice is exactly like this, but inst.go only contains the fields required for encoding, and avo needs more information.

By the way, this table is not readily available, we need some manual work to get it. But once we get it, it doesn't cost much to maintain because it doesn't need to be updated very often.

The current CL writes operand processing and instruction encoding functions by hand.

The current CL does not need to process operands, it only needs to match the operands of Go assembly instructions to the operands of machine instructions. This mapping is essential.

The current CL changes the encoding of the instruction into the encoding of the operand, and this part of the work is manual. I also said that we can make it automatic, but I think the effort of writing generating code by hand may be higher than or equal to the effort of manual encoding, because the encoding of these operands is actually very simple.

I don't know if we can get enough information to get rid of the encoding of the operand, that is, refine the operand into 'elements', and then only need a few auxiliary functions to encode the elements. Maybe this is what you mean? @cherrymui

mmcloughlin commented 1 year ago

Thanks for the thoughtful discussion. Sorry for my slow reply, I was partly busy at work and feeling a little under the weather, but I also had a lot of thoughts on this topic, as you'll see.

I think there are two related aspects to this:

Machine-Readable Instruction Data

I think the ideal way to structure assembler/disassembler internals that rely on instruction tables is something like the following.

  1. Instruction database is the the source of truth for instructions supported by the toolchain and their properties
  2. Instruction database is in some structured language-agnostic format (JSON, XML, CSV, ...)
  3. As much as possible about instruction properties is represented in the database
  4. All files that rely on the instruction database are code generated from it
  5. All code generators are checked into the main Go repository or x/arch
  6. Code generators are run by TryBots. Out-of-sync generated files fail the build.

I think this would not only make the codebase easier to develop and more robust, but it also critically promotes interoperability with other tools. Third-party tools can also consume the same instruction database and it's enforced by automated systems that the data they're consuming is in sync with the Go toolchain. I believe this would be valuable for literally any assembler but it's doubly true for Go, since it made the decision to invent a third assembly syntax that nobody else uses. Without automated translation tools Go's idiosyncratic syntax serves to maroon itself from the broader assembly and binary analysis ecosystem.

amd64

It's not just interoperability with third-party tools. For amd64 Go is not compatible with itself. For example, the Go assembler supports AVX-512 but go tool objdump does not. The largest issue on the amd64 side was due to the way AVX-512 support was added, but also because hand edits to generated files have been allowed over the years. The amd64 Go toolchain at this point breaks a few of the rules I proposed above:

The situation for the amd64 architecture led Rob Pike to declare it an "inconsistent mess". Russ Cox also said he would "like to see someone regenerate the csv" as it "serves as a kind of Rosetta stone between GNU, Intel, and Go syntax".

I have been meaning to spend time helping improve this situation, but I've not yet found the time. Sorry about that. I'm still intending to work on deriving the avo instruction database from Intel XED, in which case some of that work might end up being applicable to Go as well.

The main reason I bring this up right now is I think we have a golden opportunity together with @erifan and ARM to ensure this doesn't happen for the arm64 architecture.

arm64

I don't have deep knowledge of the arm64 codebase, so please correct me if I'm wrong in any of the following.

In the current state, I don't think have an instruction database "source of truth" for ARM either. As @erifan pointed out, in x/arch we have: inst.json is created by the arm64spec tool, and used to drive test cases in arm64asm. This is great. However, it's not a source of truth file. The Go assembler optabs were probably code generated at one time, but have been hand-edited many times since then. The arm64asm optabs were also not generated from inst.json but instead were "Generated by ARM internal tool" and cannot be reproduced from publically available information. Therefore the only way for anyone to get this information out is to reverse engineer it from arm64/arm64asm/tables.go.

With the code and design currently under review, what state would that get us to? I'm looking at cmd/internal/obj/arm64: new path for adding new instructions and it looks like we could be close but not quite there. Specifically, cmd/internal/obj/arm64/inst.go and cmd/internal/obj/arm64/args.go could be code generated from external instruction data files. There's a comment in the file saying:

// The file contains the arm64 instruction table, which is derived from instFormats
// of https://github.com/golang/arch/blob/master/arm64/arm64asm/tables.go. In the
// future, we'd better make these two tables consistent.

I'd argue we should ideally do this now rather than in the future. How is it derived? If it's derived automatically, is that code shared somewhere? But even better would seem to be that we create this source of truth file (probably has to be derived from ARM internal data), and then code generate both arm64/arm64asm/tables.go and cmd/internal/obj/arm64/inst.go from it? It would be great if these code generators could also be run by TryBots.

Database Curation

I think by far the most important aspect of this is centralizing on a source of truth, and code-generating internals from that. As @cherrymui points out, there's value in this "even if it is hand-written". Yes, even if the database is partially hand edited, at least it's all in one place that we agree on and changes are propagated by code generation to the tools that need them. However, I will say that I think the "holy grail" here is if the instruction database is rigorously curated too:

  1. The instruction database is primarily derived from one or more external sources. This could be the Intel SDM, Intel XED, or ARM's Machine Readable Specification. In the case of avo for example, it's a combination of Maratyszcza/Opcodes and the go/arch x86.v0.2.csv. To some degree, it doesn't matter as long as they are external sources that can be downloaded, but the more official the better.
  2. Patches or hand-edits to external sources are all recorded and can be repeated. This part is typically unavoidably disgusting given the state of instruction set databases these days. It's going to be a mess of special cases. Anyway, as far as I'm concerned, this part can be some of the worst code you've ever seen as long as it runs and successfully takes your external sources into your "source of truth" file.

This exists to some extent already in x/arch, with supporting packages for parsing PDF specs and XED, but it's not yet automated and enforced end-to-end. As an example of what I'm talking about, in avo:

The goal here is that the instruction database can be regenerated when the external sources change. It's going to be ugly, but at least the ugliness is all recorded and repeatable.

Of course, the final bonus here is:

  1. Instruction database testing. Given the database you can now use it to define various consistency and correctness checks on the database itself. x/arch already has these, where it compares the Go assembler to other external assemblers etc. This is awesome, but the value of these tests is massively increased if we're applying them to a source of truth file, rather than to an out-of-date copy.

Support for avo and Related Ecosystem Tooling

The next question is whether ARM might be willing to help provide a more extensive instruction database than is necessary to only implemennt assembly/diassembly, specifically including the operand data a tool like avo would need to do liveness analysis and register allocation? It seems from @erifan's comment that the answer is probably no, but I'm curious if you can elaborate on the reasons why.

To clarify, the sort of information that would really help is a full specification of operands and actions on them. For example, Intel XED provides lines of the form:

OPERANDS:    REG0=XMM_R3():w:dq:u8 REG1=MASK1():r:mskw:TXT=ZEROSTR REG2=XMM_N3():r:dq:u8 REG3=XMM_B3():r:dq:u64 IMM0:r:b

The Maratyszcza/Opcodes x86_64.xml exposes this with just input/output flags, for example:

    <InstructionForm gas-name="divw" go-name="DIVW" nacl-version="33">
      <Operand type="r16" input="true" output="false"/>
      <ImplicitOperand id="ax" input="true" output="true"/>
      <ImplicitOperand id="dx" input="true" output="true"/>
      <Encoding>
        <Prefix byte="66" mandatory="false"/>
        <REX mandatory="false" W="0" B="#0"/>
        <Opcode byte="F7"/>
        <ModRM mode="11" reg="6" rm="#0"/>
      </Encoding>
    </InstructionForm>

In contrast, the ARM Machine Readable Specification is remarkable in that it provides full instruction semantics, but also frustrating since simple operand list and actions cannot be obtained without some effort. For a random example, see the specification of the ADDS (extended register) instruction. Correct me if I'm wrong, but I think to get operand actions out of the specification you need to write an ARM Specification Language parser and interpreter? (As an aside, I've actually gone a long way down this road and I have various components written in Go for working with the ARM specificiation and parsing ASL, but it's very much unfinished and I'm nervous to share it at this point.) My takeaway so was just that surely ARM must have internal data that makes this much easier for external tool developers?

Alastair Reid pretty much says so in the conclusion to his post on Bidirectional ARM Assembly Syntax Specifications:

But, there was no demand within the company for this work because ARM already hand-written assemblers and disassemblers and because the GNU assembler contribution rules would have required contribution of the source files that we use to generate the XML and all the scripts that generate the XML and lots and lots of other parts of the sausage factory. So, with no potential consumer of my improved specification, I was not able to persuade the team maintaining the assembly specification to change to my ASL-based syntax instead of English prose.

Anyway, I'm probably rambling. I also don't know how much influence you can wield within the company. But I think the ARM Machine Readable Specification is actually quite amazing. I would just love to see some additional data in there to make it more ergonomic for ARM ecosystem developers, such as the Go project, avo and beyond.

Some more direct questions on this topic:

Thanks again @erifan for all your work on this!

erifan commented 1 year ago

@mmcloughlin I agree with you about the thoughts on assembler and disassembler. To be honest, I actually think that the arch repo should be merged into the golang/go repo, or they should at least share some instruction data and processing code, which can prevent assembler and disassembler from being out of sync. Furthermore, when we designed the first version of the new assembler, we considered that Go assembly and architectural instructions should be decoupled, so that the encoding and decoding of machine instructions will be more independent and can be interoperated or verified with third-party tools. But I don't think Go should often run a program to parse external architecture material, and then get a specific format instruction table. I think it's only necessary to do this when adding new instructions.

Regarding machine-readable arm instruction documents, we really have no better documents than ARM Machine Readable Specification. The previous arm64asm/tables.go file was exactly obtained by parsing thse xml files. We thought that this xml document was invisible to the outside, but it seems that it is now open to the outside. I know it's hard, but this really is the "best machine readable documentation" we've seen. Thank you for your interest and perspective on this issue.

mmcloughlin commented 1 year ago

@mmcloughlin I agree with you about the thoughts on assembler and disassembler. To be honest, I actually think that the arch repo should be merged into the golang/go repo, or they should at least share some instruction data and processing code, which can prevent assembler and disassembler from being out of sync. Furthermore, when we designed the first version of the new assembler, we considered that Go assembly and architectural instructions should be decoupled, so that the encoding and decoding of machine instructions will be more independent and can be interoperated or verified with third-party tools. But I don't think Go should often run a program to parse external architecture material, and then get a specific format instruction table. I think it's only necessary to do this when adding new instructions.

Glad we're in agreement :) I don't have strong opinions about x/arch vs the main repository. If there's some centralized instruction database with code generation, and some automatic enforcement, I think that's great. Exactly where and how that happens seems less important. I'm also not sure how often it would be necessary to regenerate the instruction table, but I'm proposing some level of enforcement in TryBots or otherwise so that hand-edits are strictly disallowed.

Regarding machine-readable arm instruction documents, we really have no better documents than ARM Machine Readable Specification. The previous arm64asm/tables.go file was exactly obtained by parsing thse xml files. We thought that this xml document was invisible to the outside, but it seems that it is now open to the outside. I know it's hard, but this really is the "best machine readable documentation" we've seen. Thank you for your interest and perspective on this issue.

Oh! Well, this is both good and bad news 😅 On one hand, I was hoping you had more information internally because of the frustrations I identified with the XML database. On the other hand, if the raw data is public does that mean you might be able to share the code you used internally to derive those tables? Or is that code proprietary? It would be awesome if the pipeline from the publicly available XML spec all the way to the assembler/disassembler tables was all available to be reproduced. Then in the future, the Go team and other Go contributors could do ISA updates without needing help from ARM. It would also help other ecosystem developers build on the work.

erifan commented 1 year ago

Oh! Well, this is both good and bad news 😅 On one hand, I was hoping you had more information internally because of the frustrations I identified with the XML database. On the other hand, if the raw data is public does that mean you might be able to share the code you used internally to derive those tables? Or is that code proprietary? It would be awesome if the pipeline from the publicly available XML spec all the way to the assembler/disassembler tables was all available to be reproduced. Then in the future, the Go team and other Go contributors could do ISA updates without needing help from ARM. It would also help other ecosystem developers build on the work.

The code is not private, but it is lost. This work was done by someone else before I joined Arm, I can't find the code now. Later, if this CL goes well, we will write a parser for these xml files again to get more instructions into the table, such as NEON and SVE instructions. I believe we will submit this parser to the community together for future updates. But at the moment we haven't started because we don't seem to have agreed on the design of the new assembler.

mmcloughlin commented 1 year ago

The code is not private, but it is lost. This work was done by someone else before I joined Arm, I can't find the code now.

That's a shame. I guess the XML specs were made public after that was written? It seems @cherrymui was right to ask for the generator to be checked in.

Later, if this CL goes well, we will write a parser for these xml files again to get more instructions into the table, such as NEON and SVE instructions. I believe we will submit this parser to the community together for future updates.

That would be great!

cherrymui commented 1 year ago

Sorry for the late reply.

I actually think that the arch repo should be merged into the golang/go repo

They don't necessarily have to be merged. But they can share data. The x/arch repo is vendored into the main repo, and technically the assembler could import package from x/arch. Also, the assembler could contain code generated from x/arch, e.g. for x86, https://cs.opensource.google/go/go/+/master:src/cmd/internal/obj/x86/avx_optabs.go is generated from https://cs.opensource.google/go/x/arch/+/master:x86/x86avxgen/ . We could do something similar for ARM64.

cherrymui commented 1 year ago

I don't know if we can get enough information to get rid of the encoding of the operand, that is, refine the operand into 'elements', and then only need a few auxiliary functions to encode the elements. Maybe this is what you mean? @cherrymui

Sorry, I'm not sure I understand this. I don't know what elements are...

The current CL changes the encoding of the instruction into the encoding of the operand, and this part of the work is manual. I also said that we can make it automatic, but I think the effort of writing generating code by hand may be higher than or equal to the effort of manual encoding, because the encoding of these operands is actually very simple.

Yeah, it's all about operand encoding. As I said earlier, it may be fine to have a table even if it is hand-written. To handle an operand, I'd expect we need information such as

You can express this information as code, with control flow etc. You can also express the same information as a data table and use a more generic function to process the table and input. E.g. the code at https://go-review.git.corp.google.com/c/go/+/424138/2/src/cmd/internal/obj/arm64/encoding.go#33 could, conceptually, be described as a table entry like type: Reg|RSP; encoding: bit 5-10.

Thanks.

erifan commented 1 year ago

I don't know what elements are...

I meant the factors that affect encoding, like those you listed, integer register, FP register, bit width, encoding location, etc. So I think I got your point. It would be nice if there was a machine-readable instruction table that contained these information, but unfortunately there isn't. Let me see if we can get it by parsing the xml documents provided by Arm, if not, then we have to do it manually. I will modify this CL according to this idea recently. If we think this design is ok, I will start to parse the xml document.

erifan commented 1 year ago

Hi @cherrymui I have being thinking about how to represent the instruction argument in the way you mentioned above, but I don't know how to do it in a simply way. It's feasible for those simple cases, such as <Rm> or <Rm|RSP>. But for those complex cases, things become complicated. The main difficulty is this:

An argument may contain multiple symbols, such as <Vn>.<T> in ADDV, <Vn> is a symbol, and <T> is a symbol. Different symbols will be encoded in different positions, with different widths. And the same symbol may also be encoded in different fields, with different widths, for example <T> is encoded in size:Q. And the same symbol with the same encoding fields may have different valid values in different instructions, for example, <T> in ADDV can be 8B, 16B, 4H, 8H and 4S, but in ADDP it also can be 2S and 2D. Therefore, to characterize an argument, all the symbols are needed, and the values of the symbol is also needed, and how the symbols are encoded. It is not enough for us to only know the encoding position and width of a symbol, because different symbols may have different encoding methods. For example, the same immediate value may have different scale values for different instructions. This is why there are so many different argument types for immediate values.

The current method records the encoding fields of all symbols of an argument and all possible values of the symbols, so some argument type names are a bit long. Then the encoding method of the symbol is expressed by handwriting functions. This approach may seem inelegant at first glance, but it is a viable approach. These argument types can be obtained by parsing the xml document, which is now public. So the only manual work is encoding these arguments, namely encodeArg, which I can't think of an easier way.

cherrymui commented 1 year ago

Thanks for looking into this. How does the XML store that information? For the <T> like that I'd guess we could just list all the acceptable values, something like 8B|16B|4H|8H|4S. Would it be possible to extract a (hopefully not too large) number of patterns for encoding each "symbol"?

erifan commented 1 year ago

How does the XML store that information?

Some are stored in fields, some are stored in tables, and the most troublesome thing is that some information is stored in text descriptions. An example. Parsing this XML document is not an easy job.

For the like that I'd guess we could just list all the acceptable values, something like 8B|16B|4H|8H|4S.

We also need to record where these values are encoded. So it may looks like: Size:Q; 8B|16B|4H|8H|4S If we add the encoded value corresponding to each value, it may looks like: Size:Q; 8B_00|16B_01|4H_10|8H_11|4S_21 So now it seems to look a lot like the current argument type for <Vm>.<T> arg_Vm_arrangement_size_Q_4H_108H_11__2S_20__4S_21

Maybe a symbol can be uniquely identified without the encoding value, I'm not sure.

The current argument type describes all symbols of an argument, I was wondering if we could separate it into symbolic types Vm and arrangement_size_Q___4H_10__8H_11__2S_20__4S_21. Then an argument can be represented as

type arg struct {
  type int                     // argument types, such as REG, RSP, IMMEDIATE, PCREL .etc. Describes a combination of symbols.
  symbols []symbol     // symbol type set.
}

So for <Vm>.<T>, it is:

{
  type: ARRANGEMENT,
  symbols: [
    {Vm}, {arrangement_size_Q___4H_10__8H_11__2S_20__4S_21},
  ],
}

I expected that there should be fewer symbol types than the current argument types.

Would it be possible to extract a (hopefully not too large) number of patterns for encoding each "symbol"?

Do you mean some kind of mapping? Get the encoding directly from an input symbol, such as a map, map[symbol]uint32, then we can directly get the encoding of 4H through map["4H"]. I think it's a good idea, but it is a bit difficult to implement. It depends on how we represent a symbol's acceptable values, encoding and the encoded positions. I think it may need some logic, it is difficult to be fully automated. There are many different encoding ways, some values are continuous, some are non-continuous, some are fixed values, some are signed numbers, some are unsigned numbers, some have shifts, and some require to be continuous with the previous argument, some require the arrangement of all arguments to be consistent. . . Too much.

erifan commented 1 year ago
type arg struct {
  type int                     // argument types, such as REG, RSP, IMMEDIATE, PCREL .etc. Describes a combination of symbols.
  symbols []symbol             // symbol type set.
}

I think this method is feasible. What needs to be done first is to uniquely represent a symbol. Regarding how to encode and decode a symbol, this requires some hard code.

mmcloughlin commented 1 year ago

Sorry for my late reply, I haven't been able to focus as much on stuff outside of work/personal life for a bit...

They don't necessarily have to be merged. But they can share data. The x/arch repo is vendored into the main repo, and technically the assembler could import package from x/arch. Also, the assembler could contain code generated from x/arch, e.g. for x86, https://cs.opensource.google/go/go/+/master:src/cmd/internal/obj/x86/avx_optabs.go is generated from https://cs.opensource.google/go/x/arch/+/master:x86/x86avxgen/ . We could do something similar for ARM64.

@cherrymui Agree here! I think the ideal is that Go has a development-time dependency on x/arch but not a build dependency on it. That is, files are code generated using data or tools in x/arch. As I've said elsewhere I think code generation should be enforced in trybots as well, so hand edits are strictly forbidden later.

Some are stored in fields, some are stored in tables, and the most troublesome thing is that some information is stored in text descriptions. An example. Parsing this XML document is not an easy job.

@erifan I agree with you 1000% here, having spent a long time working with this XML document and lost the will to live. It seems to me that what we're trying to do here for the Go project should be one of the primary use cases of a machine readable specification. If we have to tie ourselves in knots to get this to work, even when you're literally an employee of ARM, that's surely a sign there's a problem? How does ARM write its own toolchains, for example? Are they parsing out text descriptions from XML as well?

I still think this effort is worthwhile, however ugly it ends up being, so that we end up with a reproducible process from the public XML to the Go assembler.

However, I wonder if you are able to feedback this experience to the team at ARM who owns the machine readable specification? Please tell me if this seems excessive, but could we attempt to raise this with someone like Cheryl Hung, Director of Ecosystem at ARM? I'm not sure how to go about this, perhaps it's futile 😅

erifan commented 1 year ago

It seems to me that what we're trying to do here for the Go project should be one of the primary use cases of a machine readable specification. If we have to tie ourselves in knots to get this to work, even when you're literally an employee of ARM, that's surely a sign there's a problem? How does ARM write its own toolchains, for example? Are they parsing out text descriptions from XML as well?

Hi @mmcloughlin sorry for the late reply. Different toolchains in arm are maintained by different teams and have different implementations, some are purely handwritten, some are semi-automatic, I don't have a full view.

I still think this effort is worthwhile, however ugly it ends up being, so that we end up with a reproducible process from the public XML to the Go assembler.

Yeah, if we don't want to add instructions purely manually, this seems to be the only way currently available.

However, I wonder if you are able to feedback this experience to the team at ARM who owns the machine readable specification? Please tell me if this seems excessive, but could we attempt to raise this with someone like Cheryl Hung, Director of Ecosystem at ARM? I'm not sure how to go about this, perhaps it's futile 😅

The Arm spec documentation is maintained by a separate group, and we have given them feedback early on, but we cannot make any commitments on their behalf. You can ask him, it's always good to give their work an extra input. Thanks.

cherrymui commented 1 year ago

type arg struct { type int // argument types, such as REG, RSP, IMMEDIATE, PCREL .etc. Describes a combination of symbols. symbols []symbol // symbol type set. } I think this method is feasible. What needs to be done first is to uniquely represent a symbol. Regarding how to encode and decode a symbol, this requires some hard code.

Yeah, this sounds reasonable. And some sort of mapping from "symbol" (not really sure we want to overload this word as there are already symbols in the assembler context, e.g. obj.LSym) to encoding details sounds also the right direction. As you mentioned, it probably does need to include encoding details such as which bits are used, maybe something like 4H_10 is fine.

Thanks.

erifan commented 1 year ago

not really sure we want to overload this word as there are already symbols in the assembler context, e.g. obj.LSym

This is a small matter, the arm document calls it "symbol", we could also call it “element” or whatever. I will write a demo based on this to show you.

Thanks.

erifan commented 1 year ago

Hi @cherrymui I have updated the draft implementation based on the above discussion, would you mind taking a look at these two changes https://go-review.googlesource.com/c/go/+/424137 https://go-review.googlesource.com/c/go/+/424138. I'll try to attend the next Arm-Google sync meeting, February 15th, and it would be great if you could take a look at the code before then and give me some feedback, thanks.

mmcloughlin commented 1 year ago

@erifan I'm wondering about the status of your XML parsing work? I'm thinking about picking up this project for avo again, and I'm curious if you reached a point I could potentially build on top of? Thanks!

erifan commented 1 year ago

@mmcloughlin I have finished the parsing work and got an instruction table. But I'm still dealing with some issues about opcode naming and operand classification. Encoding and decoding of operand rules has not been done. The instruction table I generated is the same as inst.go, I'm not sure if this will work for you. If it is useful, I can generate such a table and send it to you. If you want to make some custom changes on the parser, then you may have to wait a while, I will submit the code to the community recently, and discuss some problems with @cherrymui. Thanks.

mmcloughlin commented 1 year ago

@erifan Thanks for the quick reply!

The inst.go output might be useful, but I think I will need more information for avo than the assembler would need. Specifically I'll need operand actions (read, write, ...) so that I can do liveness analysis and register allocation. I'm wondering whether I can achieve this by extending your parser work? Do you have any thoughts about how best to automatically derive this information from the XML spec, if that's feasible at all?

I did spend some time working on parsing the specs myself, but I stalled on it. I could revive that work, but I thought there might be some benefits to reusing your approach that will presumably be landed in x/arch.