Minres / CoreDSL

Xtext project to parse CoreDSL files
Apache License 2.0
16 stars 3 forks source link

Explicit instruction operands #69

Open AtomCrafty opened 2 years ago

AtomCrafty commented 2 years ago

Here is the proposal promised in #62 to get rid of all the casts whenever instruction operands are accessed.

The entire reason these casts are necessary in the first place is that there is no way to actually declare the type of operands, so they all have to be treated as bits. This is actually a design flaw that has bothered me ever since I saw how instructions are declared, so this proposal aims to change that.

So what are the issues?

Related:

Proposal

Include a new operands section with each instruction, in which all operands have to be declared. This section is only present if the instruction has at least one operand.

image

This very simple and obvious change singlehandedly solves every last problem described above, without even touching the existing encoding syntax. And in addition, from an implementation perspective it means that BitField no longer needs to be a NamedEntity. This might seem insignificant, but it really is a huge deal. It means that - with a slight tweak to how enums are parsed - we can eliminate a large number of case distinctions from both the frontend and all backends, because we will only need to work with Declarators instead of having all the logic triplicated for Declarators, BitFields and EnumMemberDeclarations.

AtomCrafty commented 2 years ago

It's even better than I thought. In my prototype implementation I managed to completely eliminate the need for NamedEntity. EntityReference is now called DeclaratorReference and declarations, enum members, bit fields and function definitions all use declarators. It simplifies a lot of the implementation because everything now uses a unified way of declaring things.

eyck commented 2 years ago

I'm against this extension as for the user of the language it enforces additional things to describe which are implicitly contained. The encoding syntax as it is right now is sufficient and easy to read and to map to e.g. ISA descriptions. Esp. the last is important.

jopperm commented 2 years ago

I'm still torn on this. I suppose you could go even further and elide the range on fields if you want to use all bits:

operands: signed<32> imm, unsigned<5> rd;
encoding: imm[31:12] :: rd :: 7'b0110111;

But still, it feels a bit redundant.

The main problem: all operands are untyped. Not even the size is known. I assume it was intended that each operand's size would be the highest bit index mentioned in the encoding plus one? The specification doesn't explain this at all.

Fair enough, that's a specification issue. In our bitvector-less present, the fields are unsigned<highest bit plus one>, as you say.

Even so, this system makes it impossible to declare operands with leading zeroes.

Why would you want to do that?

As a direct result of the unspecified type, a cast is required every time any operand is read. This causes a flood of (unsigned) that makes the code virtually impossible to reasonably parse.

I think this problem is significantly exacerbated by the introduction of the bits<> type as a third option, but otherwise, this is of course a valid point (cf. the removal of the beloved s suffix, #31).

There is no readily available list of operands. One always has to scan the entire encoding string, which contains unrelated and duplicate information.

True, but as Eyck mentioned, this represents the conventions used in (textual) ISA specs.

The operands are declared by using them. This is extremely unintuitive to anyone who is used to C and therefore goes against a core design principle of CoreDSL.

Correct, but I'd argue that we're already out of C land with the encoding tag. It also borrows our expression syntax with a non-C operator. Still, IMHO, it immediately makes sense when you look at it.

And in addition, from an implementation perspective it means that BitField no longer needs to be a NamedEntity. This might seem insignificant, but it really is a huge deal.

Is there a way to reap these benefits with the existing syntax? Something like, in the context of the encoding token, ID '[' INT ':' INT ']' is a declaration?

AtomCrafty commented 2 years ago

Even so, this system makes it impossible to declare operands with leading zeroes.

Why would you want to do that?

It's possible to declare operands with trailing zeroes, by simply not including the lower bits in any ranges. The same is not possible for high bits, because there is no way to declare their existence. It doesn't usually matter in a practical sense, because the operands will probably be casted to a larger type, but it is still a conceptual asymmetry.


The operands are declared by using them. This is extremely unintuitive to anyone who is used to C and therefore goes against a core design principle of CoreDSL.

Correct, but I'd argue that we're already out of C land with the encoding tag. It also borrows our expression syntax with a non-C operator. Still, IMHO, it immediately makes sense when you look at it.

I'm not talking about the syntax, which of course has to deviate from C. I'm talking about the very concept of implicit declarations. The way I see the encoding syntax is as "if you evaluate this concatenation expression, you will get the same instruction bit pattern that was matched to arrive here". So conceptually, the entire encoding is an expression to me and hence should not contain any implicit declarations. So yes, you're right, it does make sense as an expression.

Another thing to consider: with fields split into multiple bit ranges, there does not exist one canonical declaration for the operand. There are instead multiple "declarations", but to even determine the size of an operand you have to take all of them into account. Just looking at the first one (which is all XText resolves references to) will not necessarily tell you what the highest bit index is. The current system is akin to omitting the type specifier of variable declarations and statically deriving their type based on how they are used. "Oh, you're passing this variable to a function that expects an int? Then it must be an int."

I simply think that all of this ends up unnecessarily increasing the cognitive load, and as an implementor I'd much rather type the few characters to make things explicit instead of juggling all of these considerations in my head all the time. A very real possibility I see with the current situation is that implementors will start adding comments describing the width of operands like this, because it is entirely unreasonable to expect someone looking at the code to immediately grasp how large an operand is. And at this point they might as well just write a proper declaration.

image

image


And in addition, from an implementation perspective it means that BitField no longer needs to be a NamedEntity. This might seem insignificant, but it really is a huge deal.

Is there a way to reap these benefits with the existing syntax? Something like, in the context of the encoding token, ID '[' INT ':' INT ']' is a declaration?

There might be, I'll have to look into that.

jopperm commented 1 year ago

Insights from offline discussion:

Nevertheless, we decided to postpone this addition to the next language version.