Closed jopperm closed 1 year ago
The parser already understands (bit
)sizeof
and friends, so this issue is mainly a documentation task.
@AtomCrafty points out that the actual values returned by these operators have to be discussed and specified. In particular, do we pad structs, and how do we handle alignment in CoreDSL?
Padding is an optimzation of the compiler and alignment implements constraints of the processor. Therefore I strongly suggest to do neither allignment nor padding.
I agree that from a CoreDSL point of view it makes sense to remove all unnecessary padding, as that would reduce the amount of generated circuitry. However from a C standpoint we should keep in mind that structures might not only exist in dedicated hardware, but also in regular memory. Suppose some instructions would have to read data structures from memory, like an interrupt vector, paging tables or structured exception information. Those structures might include padding bytes. I believe a good approach would be to make structure layouts configurable via attributes. For example [[struct_layout("pack")]]
in front of the type declaration to pack it as tightly as possible and [[struct_layout("align")]]
or something similar to align primitives to multiples of their own size. Alternatively we could introduce attributes to explicitly specify the offset of a field: [[field_offset(32)]]
.
Actually we describe hardware within a processor. Layout in memory (which is basically an extern array) has to be described explicitly and will not depend on the processor-internal representation. Therefore I still opt for no padding and no alignment.
That still means we need to provide the facilities to explicitly describe the layout. And I would rather see that done with attributes than padding fields. Then again, that's just personal preference, so I won't fight you over it ^^
Well, you'd first need to define a mapping between structs/unions and a bitvector, e.g. an uint-representation of a struct. I'm not aware of such a mapping in current CoreDSL. I suppose casting a struct to anything else isn't defined either. Assuming a struct is prepresented by a ("packed") concatenation of its member's uint-/ or bitvec-representations, you can easily add arbitrary padding via dummy-members (which some also do in software, you often see such dummy members in the Linux uapi). If you want to avoid members such as "dummy1" and "dummy2", we could introduce unnamed members.
Those dummy members are exactly what I meant by "padding fields". The issue I see with those is that you will always have to manually calculate how large the padding has to be, and update the padding when the bit width of another field changes. It's error prone and doesn't clearly communicate the intent. That's why I would instead suggest to have the frontend automatically generate these padding fields in a way controlled by attributes.
struct T {
unsigned char field1;
unsigned<24> dummy;
unsigned int field2;
unsigned int field3;
}
struct T {
unsigned char field1;
unsigned int field2 [[align(32)]];
unsigned int field3;
}
struct T {
unsigned char field1;
unsigned int field2 [[field_offset(32)]];
unsigned int field3 [[field_offset(64)]];
}
Let's take the if, maybe, and potentially aside: is there any concrete example or need to specify padding and alignment when describing the inner workings of a processor? If not we can stop the discussion and define the bitsizeof operator and the sizeof operater as sizeof(T) = (bitsizeof(T)+7)/8
Is there any concrete example or need to specify padding and alignment when describing the inner workings of a processor?
Among the things we currently dabble with, I don't think so. To that end, I'd clarify in the spec that all structs are packed as an immediate course of action. OK?
In the (far?) future, I can see use-cases for "adding structure" to address spaces, maybe to alias a specific memory range known to contain records, or resembling status registers of an MM'ed peripheral device. Then, I think the attributes proposed by Mario are way nicer than dummy fields.
Idea: Define
sizeof(T)
as the number of bytes required to represent T (as in C). An additionalbitsizeof
operator may make sense in the future.The expression's type is currently hard-wired as a 32-bit integer (think:
size_t
in C).Alternatively, as the value is basically a compile-time constant, @atomcrafty proposes to make it subject to the integer literal type rules, i.e. choose the minimum bit width to represent the size.