POETSII / Orchestrator

The Orchestrator is the configuration and run-time management system for POETS platforms.
1 stars 1 forks source link

Generated types (e.g. state, messages) don't have packing specified #240

Open m8pple opened 3 years ago

m8pple commented 3 years ago

The types generated by the orchestrator do not have any explicit packing specification.

This is required by the v4 (and earlier, I think) specs, as otherwise it is impossible to do cross-platform movement of data-types, or to reliably map between raw bytes and structs:

https://github.com/POETSII/poets_improvement_proposals/blob/2a355282f3aa3912bdacf4ec50bf25199bdf310d/proposed/PIP-0020/virtual-graph-schema-v4.rnc#L768

I know that the orchestrator relies on gcc to handle this, but other implementations emit structures directly as binary (as it is much faster), and so need to have exact mappings. (To be fair, they also control the risc-v, GPU, or HLS tool, so they can work around it, but it still makes it easier).

Given the need to move structs between architectures within a system, and between systems with different architectures, there is no expectation that the C compilers will choose to lay structures out with the same alignment -

Implementation of externals (in the IO sense) is also impossible without known packing, as otherwise you have no idea how the device compiler has aligned things.

You also can't reliably move structs through bytes, as you have no idea how the compiler has decided to layout the structs. Your only choice is to move member by member, or copy each member into bytes, which is much slower.

History

At the v3 and v4 discussions I think this came up, and the solutions were:

  1. Use pack(1) and get users to pack.
    • Pain, particularly due to Tinsel lack of aligned loads, but workable.
  2. Pick a packing method that is implemented by every single implementation, based on the types and structs.
    • This requires parsing of types, and it was a goal of v4 that the orchestrator would not need to parse types.
  3. Ban 64-bit types
    • 64-bit types are critical in certain cases, so cannot be banned.
  4. Use plain bytes
    • This makes the orchestrator simpler, but prohibits huge numbers of other useful tools that already existed, such as debugging tools, model checking tools, formal verification tools.
    • Plain bytes only is also a performance inhibitor on code generation, as it means the C optimiser may make much more conservative decisions about aliasing. There have been cases in POETS Ecosystem where type-punning through bytes meant that code was slower.

So we are left with option 1.

Problem

Compile the following code on different systems:

#include <stdint.h>
#include <math.h>

struct {
    int16_t a;
    int32_t b;
    int64_t c;
}x;

int main()
{
    static_assert(sizeof(x)==16, "Size is incorrect");
}
heliosfa commented 3 years ago

Looking back at my ToDo list for the Composer, this is on there but I missed it. Good spot.

My notes say that I was going to put it around all of the message types and the global properties. After the earlier discussions, I will add it around all of the device/pin properties and state as well.

heliosfa commented 3 years ago

Fixing this, I have run into an instruction space explosion when packing everything to the point that the canonical heated plate overflows the available instruction space when compiled with default options.

Size of Text for different packing added (O2): Existing packing (sporadic, only in the poets_pht.h) 6980 Packing Message structs in MessageFormats.h 7580 Packing Global Properties 7052 Packing device properties/state Overflow by 612 bytes All Overflow by 1208 bytes

Size of Text for different packing added (Os): Existing packing (sporadic, only in the poets_pht.h) 6584 Packing Message structs in MessageFormats.h 7036 Packing Global Properties 6656 Packing device properties/state Overflow by 96 bytes All Overflow by 616 bytes

Making without the Trivial log handler, I can get it to compile with Os for a total text size of 8264. O2 still overflows by 36 bytes.

Stripping the trivial log handler for Os takes us down to a text size of 5592 with no changes to packing.

Blowing 2672 bytes (about a third of the instruction space) for unaligned reads, especially when the example uses nothing but 32-bit wide variables, seems a bit off. I will have to dig further.

heliosfa commented 3 years ago

After chatting with @DrongoTheDog and @mvousden yesterday, I have been digging into the instruction space explosion a little more today.

I have a fairly simple struct of 4-byte wide values:

typedef struct cell_state_t 
{
// Line 82

        float t = 0.0;// Our temperature
        uint32_t updated = 0;
        uint32_t fin = 0;// Finished Flag
        uint32_t finSent = 0;// Finished Sent Flag
        uint32_t finIdx = 0;// Finished Index

        float tFixed = 0.0;
        float tData[4][2] = {{0.0,0.0},{0.0,0.0},{0.0,0.0},{0.0,0.0}};

        uint32_t stepRX[2] = {0,0};// Bitmask for the inputs we have received each step
        uint32_t connected = 0;// Bitmask of connected pins, populated in Init
        uint32_t activeCount = 4;// Convenience value for number of connected devices to save recalcs

        uint32_t iter = 0;// Our iteration count
        uint32_t idleCnt = 0;

} devtyp_cell_state_t;

and a simple copier:

void Update(devtyp_cell_state_t* a, devtyp_cell_state_t* b)
{
    a->t = b->t;
    a->updated = b->updated;
    a->fin = b->fin;
    a->finSent = b->finSent;
    a->finIdx = b->finIdx;
    a->tFixed = b->tFixed;
    a->connected = b->connected;
    a->activeCount = b->activeCount;
    a->iter = b->iter;
    a->idleCnt = b->idleCnt;
    a->stepRX[0] = b->stepRX[0];
    a->stepRX[1] = b->stepRX[1];

    for(int i = 0; i<4; i++)
    {
        a->tData[i][0] = b->tData[i][0];
        a->tData[i][1] = b->tData[i][1];
    }

}

unpacked.cpp uses the default packing and packed.cpp uses `#pragma pack(push,1) ... #pragma pack(pop)" around the struct.

When built with no optimisations with RISCV GCC 8.2.0 (the version we use by default), the resulting unpacked binary ends up with a text section of 1132 bytes and the packed binary ends up with a text section of 3332 bytes. Assembly dumps from GCC: unpacked_8.2.0.s and packed_8.2.0.s. Compiling with RISCV GCC 10.2.0 ends up with the same pattern, but 28 bytes smaller text size.

Compiling with O2 (our default Softswitch optimisation) drops these sizes to 212 bytes and 1072 bytes respectively: unpacked_8.2.0_O2.s and packed_8.2.0_O2.s

DrongoTheDog commented 3 years ago

Well, I don't think this is actually the problem you're after, but in this particular case you could replace the whole Update thing with

inline void Update(devtyp_cell_state_t a, devtyp_cell_state_t b) { a=b; }

which will save a whole ton of unpacking, copying and re-packing. (The compiler will generate the copier, based probably on a memcopy of some sort.)

ADB

On 18/06/2021 21:54, Graeme Bragg wrote:

CAUTION: This e-mail originated outside the University of Southampton.

After chatting with @DrongoTheDog https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FDrongoTheDog&data=04%7C01%7Cadb%40soton.ac.uk%7C954c279d37e4484f6fd808d9329b3477%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637596464441125142%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=BKHDbrjAM4lv5cnF1uHQuSIXzvt7AXLx0roBitiEoeY%3D&reserved=0 and @mvousden https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmvousden&data=04%7C01%7Cadb%40soton.ac.uk%7C954c279d37e4484f6fd808d9329b3477%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637596464441125142%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WOjkc4%2B5iqrxzjSPC2LCl2gLXZAZQyhCiVX2dXV7x4o%3D&reserved=0 yesterday, I have been digging into the instruction space explosion a little more today.

I have a fairly simple struct of 4-byte wide values:

|typedef struct cell_state_t { // Line 82 float t = 0.0;// Our temperature uint32_t updated = 0; uint32_t fin = 0;// Finished Flag uint32_t finSent = 0;// Finished Sent Flag uint32_t finIdx = 0;// Finished Index float tFixed = 0.0; float tData[4][2] = {{0.0,0.0},{0.0,0.0},{0.0,0.0},{0.0,0.0}}; uint32_t stepRX[2] = {0,0};// Bitmask for the inputs we have received each step uint32_t connected = 0;// Bitmask of connected pins, populated in Init uint32_t activeCount = 4;// Convenience value for number of connected devices to save recalcs uint32_t iter = 0;// Our iteration count uint32_t idleCnt = 0; } devtyp_cell_state_t; |

and a simple copier:

|void Update(devtyp_cell_state_t a, devtyp_cell_state_t b) { a->t = b->t; a->updated = b->updated; a->fin = b->fin; a->finSent = b->finSent; a->finIdx = b->finIdx; a->tFixed = b->tFixed; a->connected = b->connected; a->activeCount = b->activeCount; a->iter = b->iter; a->idleCnt = b->idleCnt; a->stepRX[0] = b->stepRX[0]; a->stepRX[1] = b->stepRX[1]; for(int i = 0; i<4; i++) { a->tData[i][0] = b->tData[i][0]; a->tData[i][1] = b->tData[i][1]; } } |

unpacked.cpp https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FPOETSII%2FOrchestrator%2Ffiles%2F6679142%2Funpacked.cpp.txt&data=04%7C01%7Cadb%40soton.ac.uk%7C954c279d37e4484f6fd808d9329b3477%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637596464441135141%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=nedwwaWvL18RMxTqkXpwp%2BhYsNBUBNg3pAZCB5LRUeA%3D&reserved=0 uses the default packing and packed.cpp https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FPOETSII%2FOrchestrator%2Ffiles%2F6679139%2Fpacked.cpp.txt&data=04%7C01%7Cadb%40soton.ac.uk%7C954c279d37e4484f6fd808d9329b3477%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637596464441135141%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PUnLnz8nur39I3kqGLWOqFMzcSw2P6GbbP%2BB2xwTbcg%3D&reserved=0 uses `#pragma pack(push,1) ... #pragma pack(pop)" around the struct.

When built with no optimisations with RISCV GCC 8.2.0 (the version we use by default), the resulting unpacked binary ends up with a text section of 1132 bytes and the packed binary ends up with a text section of 3332 bytes. Assembly dumps from GCC: unpacked_8.2.0.s https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FPOETSII%2FOrchestrator%2Ffiles%2F6679264%2Funpacked_8.2.0.s.txt&data=04%7C01%7Cadb%40soton.ac.uk%7C954c279d37e4484f6fd808d9329b3477%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637596464441145134%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=64P%2F3YQOxP%2BFrfZbghM994dOhFO15jQxtAzJMaSyyk8%3D&reserved=0 and packed_8.2.0.s https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FPOETSII%2FOrchestrator%2Ffiles%2F6679267%2Fpacked_8.2.0.s.txt&data=04%7C01%7Cadb%40soton.ac.uk%7C954c279d37e4484f6fd808d9329b3477%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637596464441145134%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=OfvteqR0lgyCyBPGHtsQHVUsuuhbZ5W0XIUxIro%2BNfQ%3D&reserved=0. Compiling with RISCV GCC 10.2.0 ends up with the same pattern, but 28 bytes smaller text size.

Compiling with O2 (our default Softswitch optimisation) drops these sizes to 212 bytes and 1072 bytes respectively: unpacked_8.2.0_O2.s https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FPOETSII%2FOrchestrator%2Ffiles%2F6679313%2Funpacked_8.2.0_O2.s.txt&data=04%7C01%7Cadb%40soton.ac.uk%7C954c279d37e4484f6fd808d9329b3477%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637596464441155122%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=5n4fhFW5RyoUEa%2FH3D1clb3tnQLYSYZfMfZCyNEJmUE%3D&reserved=0 packed_8.2.0_O2.s https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FPOETSII%2FOrchestrator%2Ffiles%2F6679314%2Fpacked_8.2.0_O2.s.txt&data=04%7C01%7Cadb%40soton.ac.uk%7C954c279d37e4484f6fd808d9329b3477%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637596464441155122%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=6Gjf5YGAxQEzNzEeIfE0tCLFfBPxkNIJqBgEUzADG%2BQ%3D&reserved=0

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FPOETSII%2FOrchestrator%2Fissues%2F240%23issuecomment-864268638&data=04%7C01%7Cadb%40soton.ac.uk%7C954c279d37e4484f6fd808d9329b3477%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637596464441155122%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=TFhjZUe7F%2FokVZ9J0zA94vK8ubzoZ5ockjCZKd%2FYzf0%3D&reserved=0, or unsubscribe https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAE7EYI6SER3YDTUYSPJWWM3TTOW6TANCNFSM46VENHYQ&data=04%7C01%7Cadb%40soton.ac.uk%7C954c279d37e4484f6fd808d9329b3477%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C637596464441165115%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=fyxyC4QboQFQzNPhMbAaKB0hdK0wyZknK%2BFkBnsp%2Bsg%3D&reserved=0.

--

Andrew Brown Professor of Electronics University of Southampton Hampshire SO17 1BJ UK

T: 02380 593374 M: 07464 981171

heliosfa commented 3 years ago

which will save a whole ton of unpacking, copying and re-packing. (The compiler will generate the copier, based probably on a memcopy of some sort.)

while that may help with this toy example (which is just trying to show the explosion), it won't really help in the general case for device code.

Devices generally don't copy large swathes of state around - individual members are accessed piecemeal for a variety of different handlers.

heliosfa commented 3 years ago

Way forward - pack 1 the message formats. Do not pack 1 the rest of it.