Allow specifying constraints on encoding fields/operands

Minres / CoreDSL

Xtext project to parse CoreDSL files

Apache License 2.0

16 stars 3 forks source link

Allow specifying constraints on encoding fields/operands #96

Open PhilippvK opened 1 year ago

PhilippvK commented 1 year ago

Currently, constraints on operands/fields in the encoding are evaluated in the behavior block by explicitly raising an Illegal Instruction. It would be great if we could find a way to define those conditions somewhere else.

Examples:

RV32E: Hardwire the MSB of register operands to zero. (See #?)
Zpfsoperand: 64bit register pair operands made up of an even and odd register (rd & rd+1 where rd%2==0 and (optionally )non-overlapping constraints between source and dest registers)
Excluding reserved values in encoding fields
RVP: Implement SUNPKD810,SUNPKD820,SUNPKD830,SUNPKD831,SUNPKD832 in a single instruction block (see #80)
RVV: implement Vector Load/Stores in a more compact fashion (See #80, Allowing us to cut down >150 instruction blocks to less than 20)

eyck commented 1 year ago

Any proposals on some syntax? One option I can think of would be to add constraints to an instruction definition like:

        ADD {
            encoding: 7'b0000000 :: rs2[4:0] :: rs1[4:0] :: 3'b000 :: rd[4:0] :: 7'b0110011;
            constraints: rs2<RFS,rs1<RFS,rd<RFS
            assembly:"{name(rd)}, {name(rs1)}, {name(rs2)}";
            behavior: if (rd != 0) X[rd] = X[rs1] + X[rs2];
        }

In relation to #94 this would be more universal but would create more effort on the generation of e.g. decoders from the grammar.

PhilippvK commented 1 year ago

@eyck I had something like this in mind. However here are some thoughts about the approach. We still would need to add this line for every single instruction. It would be great to specify this only once for a complete InstructionSet to make use of inheritance.

This on the other hand would add more complexity and we would need to define how the constraints should be merged (combine or overwrite). Since we do not have something like super() this might not be a very good idea. BTW we have a similar situation with the [enable=…] Attribute right now. (Undefined behavior if specified at different levels in the hierarchy. Please correct me if I am wrong)

Should we limit the constraints to encoding fields only or also allow static architectural state variables? This would add an alternative way to deal with different/optional implementations for XLEN=32/64 without the need to define two separate Instruction sets:


ADD64 {
    constraints: XLEN==32;
    …
}

ADD32 {
    constraints: XLEN==64;
    ….
}

FOOBAR {
    constraints: XLEN==32;
    …
}

FOOBAR {
    constraints: XLEN==64;
    …
}

wysiwyng commented 1 year ago

this could possibly be handled quite elegantly with #69, the operands section could also support constraints.

AtomCrafty commented 1 year ago

@wysiwyng Could you elaborate on that? The operands section from #69 is a list of parameter declarations. I don't immediately see how constraints could work there syntactically.

@PhilippvK The main issue I see with pulling common constraints out of the individual instructions is that the operands in question would not be declared in the same lexical scope. That feels very unintuitive to me and also poses a bunch of non-trivial implementation issues. An identifier in this constraint block would suddenly no longer be referencing a single declaration, but refer to all operands with the same name declared in multiple instructions, which wouldn't even necessarily have the same type.

I'm not a user of CoreDSL, so I might be misjudging things here, but it feels to me like most of these issues could easily be solved by pulling out the common checks to a function, without the need for any new language features.

And as a final note: In case you decide on the separate section syntax, I'd suggest changing the keyword to requires, just for aesthetical reasons, as it has the same number of characters as encoding, assembly, behavior (and operands).

eyck commented 1 year ago

@AtomCrafty from a language perspective I agree that the name based mapping comes with consistency issues.

But your function proposal comes also with overhead since you need at least 3 versions of the rfs_check function. Moreover we mix in here exception behavior by using the raise() function to indicate that the instruction is illegal. For these things constraints are much more appropriate. I had this morning some discussions with @wysiwyng and we came up with an idea as follows:

InstructionSet RV32I extends RISCVBase {
    architectural_state {
        XLEN = 32;
    }
    default_operands {
    operand unsigned<5> rs1 {{rs1<RFS}};
    operand unsigned<5> rs2 {{rs2<RFS}};
    operand unsigned<5> rd {{rd<RFS}};
    }
    instructions {
        LUI { // an instruction having an additional operand
        operands: unsigned<XLEN> imm;
            encoding: imm[31:12] :: rd[4:0] :: 7'b0110111;
            assembly: "{name(rd)}, {imm:#05x}";
            behavior: if(rd!=0) X[rd] = imm;
        }
        SW {
        operands: signed<XLEN> imm;
            encoding: imm[11:5] :: rs2[4:0] :: rs1[4:0] :: 3'b010 :: imm[4:0] :: 7'b0100011;
            assembly:"{name(rs2)}, {imm}({name(rs1)})";
            behavior: {
                unsigned<XLEN> store_address = X[rs1] + imm;
                MEM[store_address] = X[rs2];
            }
        }
        ADD { // an instruction only using default operands 
            encoding: 7'b0000000 :: rs2[4:0] :: rs1[4:0] :: 3'b000 :: rd[4:0] :: 7'b0110011;
            assembly:"{name(rd)}, {name(rs1)}, {name(rs2)}";
            behavior: if (rd != 0) X[rd] = X[rs1] + X[rs2];
        }
    }
}

AtomCrafty commented 1 year ago

@eyck I like your solution. It sidesteps the scoping issue by pulling out the declarations themselves. And it also seems to address the other issues I noted in #69. I believe the syntax can be improved, but conceptually this should work out nicely.

The way you wrote it above, this would introduce three new keywords and a completely new syntax construct with the double curly braces. I think we can largely re-use existing syntax the users are already familiar with.

Instead of default_operands I would just call the section operands and re-use the same keyword as within the instruction.
The operand keyword is redundant since all declarations in this section are operands, so it can be removed without replacement.
And finally, the brace syntax doesn't make much sense to me. Where would we for example put constraints that use multiple operands in their expression? I would just pull out the constraint expressions to the top level. This is syntactically consistent with how both declarations and assignments are allowed in the architectural_state section.

operands {
    unsigned<5> rs1;
    unsigned<5> rs2;
    unsigned<5> rd;

    rs1 < RFS && rs2 < RFS && rd < RFS;
}

The implementor can of course still choose to format it in a way that keeps declarations and related constraints close together.

operands {
    unsigned<5> rs1;    rs1 < RFS;
    unsigned<5> rs2;    rs2 < RFS;
    unsigned<5> rd;     rd < RFS;
}

PhilippvK commented 1 year ago

@AtomCrafty

The main issue I see with pulling common constraints out of the individual instructions is that the operands in question would not be declared in the same lexical scope. That feels very unintuitive to me and also poses a bunch of non-trivial implementation issues.

I totally agree that this approach is probably not a good idea.

I like @eyck’s proposal (plus your comments) a lot as I am also a fan of #69 and I think this would be a great addition to the CoreDSL syntax. But we would need to define what happens if such Operands+Constraints are defined on different hierarchy levels. Would the complete operands {…} block be replaced or merged (conjunction vs. disjunction of constraints)?

AtomCrafty commented 1 year ago

@PhilippvK From a language design point of view my approach would be to conceptually concatenate the blocks, so that all constraints defined on any level in the hierarchy must hold. As an implementor of the more abstract instruction set I should be able to rely on the constraints I defined myself. An instruction set B deriving from A should not be able to break assumptions made in A (a similar notion to the Liskov substitution principle).

Then again, I'm not familiar with the subject matter, so please let me know if there are valid use cases for breaking these assumptions.