asm: add support for ISA v4

There has been a new addition to the eBPF ISA v4. This update adds 7 new instructions. We should add support for these to our asm package so we can handle programs created with this new ISA.

References:

The original proposal https://lore.kernel.org/bpf/4bfe98be-5333-1c7e-2f6d-42486c8ec039@meta.com/
The LLVM change https://reviews.llvm.org/D144829
Kernel changes https://lore.kernel.org/all/20230728011156.3711870-1-yonghong.song@linux.dev/

The LLVM changes will be part of v18 which will be a few month yet, and the linux kernel changes are part of v6.6.

I took and assembled the example instructions:

// CHECK: d7 01 00 00 10 00 00 00   r1 = bswap16 r1
// CHECK: d7 02 00 00 20 00 00 00   r2 = bswap32 r2
// CHECK: d7 03 00 00 40 00 00 00   r3 = bswap64 r3
r1 = bswap16 r1
r2 = bswap32 r2
r3 = bswap64 r3

// CHECK: 91 41 00 00 00 00 00 00   r1 = *(s8 *)(r4 + 0x0)
// CHECK: 89 52 04 00 00 00 00 00   r2 = *(s16 *)(r5 + 0x4)
// CHECK: 81 63 08 00 00 00 00 00   r3 = *(s32 *)(r6 + 0x8)
r1 = *(s8 *)(r4 + 0)
r2 = *(s16 *)(r5 + 4)
r3 = *(s32 *)(r6 + 8)

// CHECK: 91 41 00 00 00 00 00 00   r1 = *(s8 *)(r4 + 0x0)
// CHECK: 89 52 04 00 00 00 00 00   r2 = *(s16 *)(r5 + 0x4)
r1 = *(s8 *)(r4 + 0)
r2 = *(s16 *)(r5 + 4)

// CHECK: bf 41 08 00 00 00 00 00   r1 = (s8)r4
// CHECK: bf 52 10 00 00 00 00 00   r2 = (s16)r5
// CHECK: bf 63 20 00 00 00 00 00   r3 = (s32)r6
r1 = (s8)r4
r2 = (s16)r5
r3 = (s32)r6

// CHECK: bc 31 08 00 00 00 00 00   w1 = (s8)w3
// CHECK: bc 42 10 00 00 00 00 00   w2 = (s16)w4
w1 = (s8)w3
w2 = (s16)w4

// CHECK: 3f 31 01 00 00 00 00 00     r1 s/= r3
// CHECK: 9f 42 01 00 00 00 00 00     r2 s%= r4
r1 s/= r3
r2 s%= r4

// CHECK: 3c 31 01 00 00 00 00 00     w1 s/= w3
// CHECK: 9c 42 01 00 00 00 00 00     w2 s%= w4
w1 s/= w3
w2 s%= w4

Which currently render as

         0: SwapLE dst: r1 imm: 16
         1: SwapLE dst: r2 imm: 32
         2: SwapLE dst: r3 imm: 64
         3: LdXMode(128)B 
         4: LdXMode(128)H 
         5: LdXMode(128)W 
         6: LdXMode(128)B 
         7: LdXMode(128)H 
         8: MovReg dst: r1 src: r4
         9: MovReg dst: r2 src: r5
        10: MovReg dst: r3 src: r6
        11: Mov32Reg dst: r1 src: r3
        12: Mov32Reg dst: r2 src: r4
        13: DivReg dst: r1 src: r3
        14: ModReg dst: r2 src: r4
        15: Div32Reg dst: r1 src: r3
        16: Mod32Reg dst: r2 src: r4

No parsing issues, its just that we don't dump the instructions correctly and we can't generate them.

On that note, I have been looking at implementing this since it seems simple. However, with this ISA they seem to have run out of bits in the opcode byte, so they have now resorted to using the offset in non-jump instructions as part of the effective opcode. This doesn't match with the way we have been architecting the asm package.

We currently define the possible instruction in terms of just a uint8 which we get by masking off the opcode. But this no longer works if the offset also determines the operation.

// ALUOp are ALU / ALU64 operations
//
//  msb      lsb
//  +----+-+---+
//  |OP  |s|cls|
//  +----+-+---+
type ALUOp uint8

const aluMask OpCode = 0xf0

const (
    // InvalidALUOp is returned by getters when invoked
    // on non ALU OpCodes
    InvalidALUOp ALUOp = 0xff
    // Add - addition
    Add ALUOp = 0x00
    // Sub - subtraction
    Sub ALUOp = 0x10
    // Mul - multiplication
    Mul ALUOp = 0x20
    // Div - division
    Div ALUOp = 0x30
    // [...]
)

I see a few ways to go about this.

Making Ops wider and moving some logic to instructions

Not that offset is technically part of the op, we can change the definition of in this case ALUOp to include it. We would then also have to move the ALUOp related function from OpCode to Instruction since OpCode doesn't know about the offset.

type ALUOp struct {
  op uint8
  offset int16
}

var (
    // InvalidALUOp is returned by getters when invoked
    // on non ALU OpCodes
    InvalidALUOp = ALUOp{0xff, 0}
    // Add - addition
    Add = ALUOp{0x0, 0}
    // Sub - subtraction
    Sub = ALUOp{0x10, 0}
    // Mul - multiplication
    Mul = ALUOp{0x20, 0}
    // SMul - signed multiplication
    SMul = ALUOp{0x20, 1}
    // Div - division
    Div = ALUOp{0x30, 0}
    // SDiv - signed multiplication
    SDiv = ALUOp{0x30, 1}
    // [...]
)

func (op OpCode) ALUOp()  -> func (ins *Instruction) ALUOp()
func (op OpCode) SetALUOp(alu ALUOp) OpCode  -> func (ins *Instruction) SetALUOp(alu ALUOp)

New op types

In the end, the asm package is used for 2 things, decompiling + printing and generating bytecode. So as long as we provide both, we don't have to be internally consistant about it.

We could introduce an additional type which would appear usable to the user as asm.SMul.Imm(asm.R1, 123) for example for the encoding side. Then when printing we can make special cases in Instruction.Format where we take the offset into account.

type SignedALUOp uint8

const (
    SMul SignedALUOp = 0x20
    SDiv SignedALUOp = 0x30
)

// Op returns the OpCode for an ALU operation with a given source.
func (op SignedALUOp) Op(source Source) OpCode {
    return OpCode(ALU64Class).SetALUOp(ALUOp(op)).SetSource(source)
}

// Reg emits `dst (op) src`.
func (op SignedALUOp) Reg(dst, src Register) Instruction {
    return Instruction{
        OpCode: op.Op(RegSource),
        Dst:    dst,
        Src:    src,
        Offset: 1,
    }
}

// Imm emits `dst (op) value`.
func (op SignedALUOp) Imm(dst Register, value int32) Instruction {
    return Instruction{
        OpCode:   op.Op(ImmSource),
        Dst:      dst,
        Constant: int64(value),
        Offset:   1,
    }
}

Special functions

In the same spirit as the above, we could just add signed versions of the current methods we have. They will only work with very specific instructions and generate bytecode the kernel will not accept otherwise. We still have to do the string special cases.

// SignedReg emits `dst (signed op) src`.
func (op ALUOp) SignedReg(dst, src Register) Instruction {
    return Instruction{
        OpCode: op.Op(RegSource),
        Dst:    dst,
        Src:    src,
        Offset: 1,
    }
}

// SignedImm emits `dst (signed op) value`.
func (op ALUOp) SignedImm(dst Register, value int32) Instruction {
    return Instruction{
        OpCode:   op.Op(ImmSource),
        Dst:      dst,
        Constant: int64(value),
        Offset: 1,
    }
}

You read my mind with this issue xD Couple of questions:

Are SMod, SDiv the only problematic extensions? Seems like we also need to account for MovSX ALUOp. (SMov?)
Are there many users out there which manually craft ALUOp instead of using the constants we provide? Your proposal to widen the ALUOp would break them.
Does it matter that we go from const to var in your first proposal?

For SignedALUOp: as far as I understand most ALUOp are actually signed (addition, subtraction), it's mod and div that are the exception? If that is true only making SMod, SDiv a SignedALUOp is kind of misleading.

For SignedReg, we'd have code like this asm.Mov.SignedReg(R0, R2) to mean "move while sign extending" and asm.Div.SignedReg(R0, R2) for signed division. But asm.Add.SignedReg(R0, R2) wouldn't do what you expect it to?

Finally, not related to your proposal. We use 0xff to denote an invalid OpCode and refuse to marshal instructions containing that value: https://github.com/cilium/ebpf/blob/572c585db9947b8fe9fbee39092c2a5bf5ccf9e6/asm/opcode.go#L80 Seems to me like that might actually be a valid OpCode now?

Are SMod, SDiv the only problematic extensions? Seems like we also need to account for MovSX ALUOp. (SMov?)

No, SMov uses 8, 16 and 32 in the offset, it will need a Method to specify the destination size. Same with the new bswap, all are in the ALUOp though. For the sign extended load they did add a new bit in the op field, and the new jump always is class dependent, so no issues there.

I mainly focused on SMul and SDiv to keep things concise.

Are there many users out there which manually craft ALUOp instead of using the constants we provide? Your proposal to widen the ALUOp would break them.

I do not know, we might have to take a peek in some projects. Or get people to tell us somehow. Agreed, that chancing the type might be breaking if not used as expected.

Does it matter that we go from const to var in your first proposal?

Go doesn't allow you to define constant structs for some reason, perhaps another reason to not use them.

For SignedALUOp: as far as I understand most ALUOp are actually signed (addition, subtraction), it's mod and div that are the exception? If that is true only making SMol, SDiv a SignedALUOp is kind of misleading.

I believe in the case of add and sub it doesn't matter due to the way 2's complement works. But yea, perhaps we can call it ExtendedALUOp or something like that since we then also include SMov and bswap16/32/64 which are not signed but sign-extending

For SignedReg, we'd have code like this asm.Mov.SignedReg(R0, R2) to mean "move while sign extending" and asm.Div.SignedReg(R0, R2) for signed division.

They would have to be separate, since with the Mov you need to specify a size like asm.Mov.SignExtendedReg(R0, R2, Word)

This is where I like having the ExtendedALUOp types myself since we don't have to expose these extra functions for the rest of the ops.

But asm.Add.SignedReg(R0, R2) wouldn't do what you expect it to?

I would have to double check the code, but the versifier will likely reject that. So if we can avoid that mistake via API then I like that better.

I believe in the case of add and sub it doesn't matter due to the way 2's complement works.

Ah, good point. But for example we have both LSh, RSh (ignore sign) and ArSh (for arithmetic shift) in ALUOp, which convinces me that distinguishing based on signedness is not the way to go.

with the Mov you need to specify a size like asm.Mov.SignExtendedReg(R0, R2, Word)

That is nice API wise, but note that Word, etc. don't have the "correct" encoding to be put into offset directly (hey, it's a mess :D) I don't think this is a deal breaker, see HostTo.

Do we also need SignExtendImm?

SMov and bswap16/32/64

I just realised: SMov uses offset to store size, bswap uses imm. :cry:

I'm still not sure what the best option is. It's already possible (and fairly easy) to generate bogus asm using the package so adding ALUOp.SignExtendReg or whatever wouldn't be the end of the world. But the simplest thing is probably ExtendedALUOp to start with: it's a purely additive change. Then we can iterate.

If we decide that we want to add the new insns to ALUOp we can just repurpose unused bits and don't make it a struct. The current format is 1:1 with wire format for convenience, but we can break that assumption (but of course it'll have ripple effects on API and such).

Do we also need SignExtendImm?

I don't think so, the SMov only works with a Register source.

But the simplest thing is probably ExtendedALUOp to start with: it's a purely additive change. Then we can iterate.

I think so as well, I will play around with this for now.

cilium / ebpf