Minres / CoreDSL

Xtext project to parse CoreDSL files
Apache License 2.0
14 stars 2 forks source link

Elaboration #46

Closed AtomCrafty closed 1 year ago

AtomCrafty commented 2 years ago

I'll open this as a collection thread for questions and discussions related to the elaboration phase. I have started work on the elaboration implementation and will update this as I encounter new issues.

1. Duplicate type declarations

Are type declarations within a super type visible in the derived type? If yes, how do we handle duplicate type declarations? Are they always forbidden? Do we perform a memberwise comparison to test if they are identical?

2. Cyclic type dependencies

My current implementation reports cycles in the inheritance hierarchy like this. What do you think, should the marker be on the name before or after the extends keyword? image

3. Constant evaluation procedure

My current idea for the evaluation of constants is the following five step algorithm:

  1. Collect all declarators and group them by name
  2. Collect all assignments and make sure they assign to one of the declarators
  3. Calculate the assigned values a) Evaluate the right-hand side of each assignment b) Make sure all assignment to the same constant result in the same value
  4. Merge declarations a) Evaluate the widths of integer types b) Make sure the types of all declarations of a constant are equal c) (?) Make sure no two declarations for the same constant are in the same ISA scope d) (?) Make sure no two assignments the same constant are in the same ISA scope
  5. Validate types a) Make sure each calculated value actually fits in the respective constant's type

The basic idea is to calculate the values first and care about types later, because the constant back and forth between determining types and values is what's currently causing us so many headaches.

My main question is how step 3a should deal with dependencies between constants.

Here is a simple example (A). y needs to be evaluated first, because x depends on y:

int x = y:
int y = 1:

Here is an example (B) of an impossible constellation. Neither constant can be evaluated because there is a cyclic dependency. I would propose to generate a similar error message to the one for cyclic type dependencies.

int x = y:
int y = x:

The most interesting example (C) is this: We again have a cyclic dependency, but a third assignment can break the cycle. This would be resolvable with a fixpoint solver. Do we want that?

int x = y:
int y = x:
int y = 1:

Example C also has another issue: suppose the first two assignments were in an instruction set and the third in a core definition providing said instruction set. That would mean the instruction set produces an error, but the core compiles without issues. I'm curious about your input here.

Also, are 4c and 4d allowed? The spec isn't explicit on this.

AtomCrafty commented 2 years ago

Addendum to 3:

Another issue that came up: sizeof can complicate this further, since expressions could depend on the size of other constants, which in turn can depend on the value of other expressions, possibly creating circular dependencies:

unsigned<XLEN> XLEN = bitsizeof(XLEN);

I would propose to simply disallow sizeof and bitsizeof of constant definitions during constant evaluation. sizeof(int) and other types should be valid though.

4. Attribute for default values used during analysis

Julian and I were talking about how type validation of instruction sets should handle incomplete types like unsigned<XLEN>. One idea was to introduce an attribute that can be applied to ISA level constants to specify a default value used during analysis. Any suggestions what we should name this attribute?

eyck commented 2 years ago

Another issue that came up: sizeof can complicate this further, since expressions could depend on the size of other constants, which in turn can depend on the value of other expressions, possibly creating circular dependencies: unsigned<XLEN> XLEN = bitsizeof(XLEN); I would propose to simply disallow sizeof and bitsizeof of constant definitions during constant evaluation. sizeof(int) and other types should be valid though.

The example is illegal anyways as you reference a variable before declaring it. I think disallowing sizeof and bitsizeof is problematic since the following would not be possible:

InstructionSet A {
    architectural_state {
        unsigned XLEN;
        register<XLEN> R;
    }
}

InstructionSet B extends A {
    architectural_state {
        unsigned ACC_LEN=2*sizeof(R);
        register<ACC_LEN> ACC;
    }
}

Core C provides B {
    architectural_state {
        XLEN=20;
    }
}

Wrt. 4: To my understanding you cannot do type validation on InstructionSet because of the parameterization (as in the example above). The validation can only be done at the Core definition level. Therefore I use in our generator class a GenerationContext which holds the Core definition to derive the values of constants and sizes of type. This context the also allows to cache the values so that the AST only needs to be traversed once for a particular value within a particular context

AtomCrafty commented 2 years ago

Oh, okay. The specification doesn't mention that expressions can only contain constants declared earlier. Adding that as a rule would unfortunately not prevent cyclic dependencies, as you can see here:

int XLEN;
int<XLEN> reg;
XLEN = bitsizeof(reg);

My current implementation is basically an iterative solver. It starts off knowing none of the values and in each iteration it calculates new constant values determinable only from the ones known so far. If none can be determined anymore, the remaining must have a circular dependency.

If we need sizeof, I can probably treat each constant's size as another implicit constant that can be depended on and depend on other constants.

jopperm commented 2 years ago

1. Duplicate type declarations Are type declarations within a super type visible in the derived type?

Yes.

If yes, how do we handle duplicate type declarations? Are they always forbidden? Do we perform a memberwise comparison to test if they are identical?

Let's keep it simple and simply forbid redeclaring user types.

jopperm commented 2 years ago

2. Cyclic type dependencies What do you think, should the marker be on the name before or after the extends keyword?

After the extends makes more sense to me, because the extends-clause is what constitutes the error.

jopperm commented 2 years ago

3. Constant evaluation procedure

The procedure you propose sounds right to my. Yes, you definitely will have to iterate the evaluation until a fix point is reached (or detect cycles and error out). My mental model for the implementation parameters is that they behave like variables, syntactically. So they must be declared before use (either in the current architectural_state section, or in one of the parent ISAs); multiple declarations per ISA are prohibited; multiple assignments are ok. Assignments are evaluated in "program" order.

+1 on allow sizeof and friends; Eyck's example unsigned ACC_LEN=2*sizeof(R); is a good one I think.

Should we restrict the types that can be used on implementation parameters to unsigned<...literal...>/signed<...literal...>, to mitigate the back-and-forth between constant evaluator and type checker? We could also add a new alias param for a "big enough" integer type, as in param XLEN = 32;.

jopperm commented 2 years ago

4. Attribute for default values used during analysis Eyck: To my understanding you cannot do type validation on InstructionSet because of the parameterization (as in the example above). The validation can only be done at the Core definition level.

@eyck: Agreed, writing an InstructionSet means flying blind, as in C++ templates. Doesn't matter for non-interactive use. The question is, can/should we improve the editor experience by assuming sane default values, specified via an attribute? Mario means something like

unsigned<32> XLEN [[assume=32]];
eyck commented 2 years ago

As a language user I would expect some influence in the implementation if writing some attribute. Therefore to me this would be missleading. Just for type checking in the editor I think it is safe to implicitly assume a value without writing it explicitly. This value could be set in the editor preferences but should not be part of the code.

Agreed, writing an InstructionSet means flying blind, as in C++ templates. Doesn't matter for non-interactive use. The question is, can/should we improve the editor experience by assuming sane default values, specified via an attribute? Mario means something like [...]

I would not spend to much effort on the editor experience. Even the Eclipse Development Tools (CDT) cannot flag problems if the template parameters are not known. We should follow the same approach. If we cannot check we cannot flag... And eventually once the CoreDef is given the value can be evaluated.

eyck commented 2 years ago

BTW,

unsigned<32> XLEN [[assume=32]];

means to me basically the same than

unsigned<32> XLEN =32;

since I can assign the value later on....

AtomCrafty commented 2 years ago

Let's keep it simple and simply forbid redeclaring user types.

What about core definitions? Multiple inheritance means a core can see multiple distinct types of the same name.


Yes, you definitely will have to iterate the evaluation until a fix point is reached

Assignments are evaluated in "program" order.

These two statements directly contradict each other. Having a fixed evaluation order is impossible.


Should we restrict the types that can be used on implementation parameters [...], to mitigate the back-and-forth between constant evaluator and type checker?

There is no point in doing that now since I already implemented the more powerful variant that allows arbitrary (compile-time evaluable) expressions (#49).


[W]riting an InstructionSet means flying blind, as in C++ templates.

I'm not sure I understand you right... are you saying we don't want any validation for instruction sets? C++ templates don't let you "fly blind", any decent IDE will report most errors you make in a template, even without knowing the exact instantiation parameters.

When we last spoke about this, you said we should validate as much as possible without the full picture. The "default value" attributes would work like IntelliSense annotations in C++, providing example values with which the analysis would check program validity. Without default values, every expression that accesses unassigned state variables would be impossible to validate.


As a language user I would expect some influence in the implementation if writing some attribute. Therefore to me this would be missleading.

Using attributes and annotations for analysis only is common practice across pretty much all languages I have worked with. For example [[nodiscard]] in C++, [DebuggerStepThrough] in C#, @SuppressWarnings() in Java, etc.


unsigned<32> XLEN [[assume=32]];

means to me basically the same than

unsigned<32> XLEN =32;

since I can assign the value later on....

That is not how constant assignments work. If you assign 32 when declaring the constant, it can never be re-assigned to a different value (only re-assignments to 32 are valid), effectively making the parametrization useless.

eyck commented 2 years ago

In the statement

unsigned<32> XLEN =32;

is XLEN not a constant, it is just an unsigned variable. But the value XLEN is fixed in the instructions section. At least this is as we (@jopperm and me) discussed and defined it in the language. Constants in the architectural_state section are declared using const .

AtomCrafty commented 2 years ago

For each parameter/reset value in the combined architectural state of a Core, a CoreDSL 2 frontend must be able to evaluate all assignments to a unique value at compile time. In other words, there must either be a single assignment in the combined architectural state, or in case multiple assignments exist, the expressions on their right-hand sides must evaluate to the same constant value after a fix point is reached.

Example

InstructionSet B extends A { architectural_state { XLEN = 32; } }
InstructionSet C extends A { architectural_state { XLEN = 16; } }

Core R provides A    { }                                    // ERROR: missing definition
Core S provides A    { architectural_state { XLEN = 17; } } // OK
Core T provides B    { }                                    // OK
Core U provides B    { architectural_state { XLEN = 32; } } // OK
Core V provides C    { architectural_state { XLEN = 32; } } // ERROR: conflicting definitions
Core W provides B, C { }                                    // ERROR: conflicting definitions

Why do I even bother implementing what the specification says if the specification is apparently completely wrong? Those are 10 hours of my life I'm not going to get back.

jopperm commented 2 years ago

I'll setup an offline meeting to discuss this.

jopperm commented 2 years ago

We decided to change the parameter elaboration to make the last assignment effective.

@eyck, @AtomCrafty: Please review my changes to the spec. I also added a paragraph to the rationale page to document our motivation for this change.

eyck commented 2 years ago

I would change the wording Cyclic dependencies between parameters must be detected and reported as errors. (elaboration.md) to simply Cyclic dependencies are not allowed. The first statement refers to implementation details....

AtomCrafty commented 2 years ago

It seems that the changes reflect what we discussed, however the ArchStateItem grammar on this page doesn't make any sense. It allows several nonsensical statements:

eyck commented 2 years ago

The first is legal althoug results in a NOP. The last one might make sense if the assignment shall be qualified. The second one is definitly garbage. So I guess it should read ArchStateItem ::= ('const' TypeSpecifier)? ID '=' ConstantExpression Attribute* ';'

jopperm commented 2 years ago

[...] the ArchStateItem grammar on this page doesn't make any sense [...]

Ah thanks for spotting this, this has to be a relic from an early version. At some point, I introduced separate subsections for ASI declarations and assignments, but have forgotten to update the pseudo-grammar.

Fix: I made the TypeSpecifier parts in the declarations-subsection non-optional.

AtomCrafty commented 2 years ago

@eyck The first is not legal, though admittedly it's syntactically valid. A program containing this statement should be rejected by the validator because a; is not a valid expression statement. Only assignment expressions, pre/post-increment and decrement, as well as function call expressions are permitted inside of an expression statement.

eyck commented 2 years ago

I don't know where you take this statement from. But as a user and with the premise 'CoreDSL is C-like' I would expect a; to be valid statement.

jopperm commented 2 years ago

I suppose you are both right: a; should be a legal statement in an instruction's behavior, but this form is not on the list of sanctioned statements for the architectural_state section.

AtomCrafty commented 2 years ago

Well I stand corrected. I'm surprised C allows this, as most languages disallow expression statements without side effects. Though I'm certain 100% of occurences will be unintentional, so we should at least emit a warning if it ever comes up.

eyck commented 2 years ago

I agree, this should yield a warning as most C/C++ compilers do.

AtomCrafty commented 2 years ago

Issue: because of the way the architectural_state section is parsed, we can't know whether a declaration or an assignment comes first in the code. So the following two snippets result in the same AST, even though one is invalid and the other isn't. I do not want to manually retrieve the corresponding CST nodes to compare line numbers, so I guess we need to change the grammar again.

architectural_state {
   int x;
   x = 5;
}
architectural_state {
   x = 5;
   int x;
}
eyck commented 2 years ago

I do not understand the problem. Both examples are a sequence of statements. From the point of assignment only statments before the assigment are considered. Even if I collect all architectural states accoding to the traversal order I get a sequence where the same approach applies. So what's the issue here?

Actually this was the way the scoping was implemented (at least in the initial version). So in the latter case the Scping would not return any candidate (since x has not been declared before the assignment) so the framework would flag the assignment as having an error.

AtomCrafty commented 2 years ago

The issue is that with the current implementation it is impossible for the scope provider to know which declarations come before a given assignment, because they are parsed into two separate lists. It is not known to the scope provider how these two lists were interleaved in the original code. You are right that this used to be solved differently, but I changed how the section was parsed in order to simplify the grammar a while back, which wouldn't have been a problem had we kept the old specification. I now have to revert said grammar change because the order of statements is relevant with the new semantics.

jopperm commented 2 years ago

Just for reference, you mean the separate lists here, correct?

https://github.com/Minres/CoreDSL/blob/30f384a0a7741002dd45f6c04c25d43fa32b51ee/com.minres.coredsl/src/com/minres/coredsl/CoreDsl.xtext#L18

I agree, seems that a rollback is necessary here. Maybe you can offer getDeclarations() and getAssignments() helpers somewhere?

jopperm commented 1 year ago

Closing this, at it has been addressed recently in #59.