agg23 / openfpga-litex

A RISC-V software platform, exposing Analogue Pocket capabilities in a simple way
MIT License
36 stars 2 forks source link

C programs fail in various ways if more than ~8k of RAM used #3

Closed mcclure closed 9 months ago

mcclure commented 9 months ago

I have been writing apps against the Rust instructions. They have been working great, I can use the heap, I can do anything I want.

I have now tried the C instructions just to see if I can make a good sample program. They work until I attempt to use "too much" memory, where "too much" is somewhere in the neighborhood of 8144 bytes. What happens next varies. I have two minimal examples and one ungainly one. Based on Discord discussion agg recommends

replace the sram references with main_ram in c-linker.ld. For the .data segment, just drop > sram AT > main_ram and replace with > main_ram

I can paste my exact running instructions but they're the ones from the README.md. You can test any of these by pasting them into helloworld/hello.c.

Case 1: The data is stored on the stack.

Okay
Minimal repro cases

**C Stack failure**

Example is simple:

```C
// JUST CRASHES ON STARTUP; LITERALLY NEVER PRINTS ANYTHING. NO MESSAGES
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void)
{
    printf("C: Hello, world!\n");

    while (1)
    {
        #define SHADOW_FRAMEBUFFER_SIZE 7980
        uint8_t shadow_framebuffer[SHADOW_FRAMEBUFFER_SIZE]; // Too big for stack
        uint8_t test = shadow_framebuffer[3]; // IF I COMMENT OUT THIS LINE, NO FAILURE
        printf("LOOP %d\n", (int)test);
    }

    return 0;
}

By the way, imagine I add this before line 49 of the helloworld Makefile.

        -Wl,-z -Wl,stack-size=0x1000000 \

It has no effect.

Case 2: The data is stored in a global (big version)

// WON'T COMPILE; "SECTION .bss WILL NOT FIT IN REGION `sram`"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void)
{
    printf("C: Hello, world!\n");

    #define SHADOW_FRAMEBUFFER_SIZE (7980*2)
    static uint8_t shadow_framebuffer[SHADOW_FRAMEBUFFER_SIZE]; // Too big for stack

    while (1)
    {
        // COMMENT OUT THESE NEXT TWO LINES AND NO FAILURE
        for(int c = 0; c < SHADOW_FRAMEBUFFER_SIZE; c++)
            shadow_framebuffer[c] = 0;
        uint8_t test = shadow_framebuffer[3];
        printf("LOOP NONZERO? %s\n", test?"Y":"N");
    }

    return 0;
}

Case 3: The data is stored in a global (small version)

This is the case where I originally saw the problem. I cannot get this variant to reproduce with a minimal example. This one is a little hard to describe so I will post the full example if the c-linker.ld change doesn't help.

mcclure commented 9 months ago

Okay. A bit confused.

I put this file in .

c-linker.ld.txt

As agg recommended, this removes or replaces with main_ram all uses of sram.

I then set it as the base linker script with

diff --git a/lang/c/examples/fungus/Makefile b/lang/c/examples/fungus/Makefile
index 33a10ea..f783269 100644
--- a/lang/c/examples/fungus/Makefile
+++ b/lang/c/examples/fungus/Makefile
@@ -14,7 +14,7 @@ OBJECTS   = crt0.o fungus.o
 BUILD_DIR = $(LITEX_ROOT_DIRECTORY)/build/litex
 SOC_DIRECTORY = $(LITEX_ROOT_DIRECTORY)/vendor/litex/litex/soc
 LINKER_DIRECTORY = ../../../linker
-BASE_LINKER_FILE = $(LINKER_DIRECTORY)/c-linker.ld
+BASE_LINKER_FILE = ./c-linker.ld

 # This uses LiteX generated variables, relative to the build directory, which will not be correct if you are using the committed artifacts

Now the program is working with more or less correct logic (the "static" version, the large one— have not yet tried with other test programs), BUT I no longer receive printfs on litex_term.py. So that is one step forward one step back since printf is very helpful for debugging.

Will attempt some more tests.

mcclure commented 9 months ago

Case 3: The data is stored in a global (small version)

Okay, here is the previously promised "small" static test. I apologize for the length but no shorter versions exhibit the bug so clearly.

// Fluidly expanding colors

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// #include <irq.h>
// #include <libbase/uart.h>
// #include <libbase/console.h>
#include <generated/csr.h>

// Turn on to print warnings when frame fails to draw within vblank
#define SPEED_DEBUG 0

#define DISPLAY_WIDTH 266
#define DISPLAY_HEIGHT 240

#define BITS5 ((1<<5)-1)
#define BITS6 ((1<<6)-1)
// Takes 3 numbers in range 0..64. Lowest bit on R and B will be discarded.
#define COLOR(r,g,b) ( ( (((r)>>1)&BITS5)<<11 ) | ( ((g)&BITS6)<<5 ) | ( (((b)>>1)&BITS5) ))

#define PILLAR_COUNT 3
#define PILLAR_SIZE 10
#define PILLAR_GAP 60
#define PILLAR_COLOR 0x
#define PILLARS_SIZE (PILLAR_GAP*(PILLAR_COUNT-1)+PILLAR_SIZE*PILLAR_COUNT)
// Given an axis of size n, what offset is needed to center the group of pillars?
#define PILLARS_BASE(n) (((n)-PILLARS_SIZE)/2)

#define CANDIDATE_COUNT 100
#define WINNER_COUNT 100

// What index within the framebuffer is this pixel at?
#define AT(x,y) (((y)*DISPLAY_WIDTH)+(x))

typedef struct {
    uint16_t x;
    uint16_t y;
} Candidate;

inline Candidate make_candidate(int x, int y) {
    Candidate candidate = {x,y};
    return candidate;
}

int main(void)
{
    printf("-- Fungus --\n");

    uint16_t *fb = (uint16_t *)(uintptr_t)video_framebuffer_dma_base_read();

    // Fill screen with black
    for(int c = 0; c < DISPLAY_WIDTH*DISPLAY_HEIGHT; c++)
        fb[c] = 0;

    // Draw a 3x3 grid of rectangles with a hole in the middle, to break up the field
    int y_root = PILLARS_BASE(DISPLAY_HEIGHT), x_root = PILLARS_BASE(DISPLAY_WIDTH);
    {
        for(int by = 0; by < 3; by++) {
            for(int bx = 0; bx < 3; bx++) {
                if (bx == (PILLAR_COUNT/2) && by == (PILLAR_COUNT/2)) // Hole
                    continue;
                int y_block = y_root+by*(PILLAR_SIZE+PILLAR_GAP), x_block = x_root+bx*(PILLAR_SIZE+PILLAR_GAP);
                for(int y = 0; y < PILLAR_SIZE; y++) {
                    for(int x = 0; x < PILLAR_SIZE; x++) {
                        fb[AT(x_block+x, y_block+y)] = COLOR(32, 0, 0);
                    }
                }
            }
        }
    }

    // Who needs a heap anyway
    Candidate candidates[2][CANDIDATE_COUNT];
    int candidates_len[2] = {0,0};
    int current = 0;
    int color = COLOR(0,32,0);
    #define SHADOW_FRAMEBUFFER_SIZE 7980
    //((DISPLAY_WIDTH*DISPLAY_HEIGHT)/8)
    static uint8_t shadow_framebuffer[SHADOW_FRAMEBUFFER_SIZE]; // Too big for stack

    while (1)
    {
        while (1) {
            uint32_t video = apf_video_video_read();
            if (apf_video_video_vblank_triggered_extract(video))
                break;
        }

        // At any one time we have two lists of points we can expand into;
        // one for the current frame, and one for the next frame.
        int next = (current+1)%2;
        Candidate *candidates_current = &candidates[current][0];
        Candidate *candidates_next = &candidates[next][0];

        #define CANDIDATE_PUSH(candidate) { candidates_next[candidates_len[next]] = candidate; candidates_len[next]++; }

        if (candidates_len[current] == 0) {
            candidates_current[0] = make_candidate(DISPLAY_WIDTH/2, DISPLAY_HEIGHT/2);
            candidates_len[current] = 1;
        }

        // Draw the current list (but only the lucky first handful)
        for(int idx = 0; idx < WINNER_COUNT && idx < candidates_len[current]; idx++) {
            Candidate winner = candidates_current[idx];
            fb[AT(winner.x, winner.y)] = color;
        }

        // Prepare for next frame
        // Since we don't have to worry about vblank finishing, we can take our time now.
        for(int idx = 0; idx < SHADOW_FRAMEBUFFER_SIZE; idx++)
            shadow_framebuffer[idx] = 0;
        int idx = 0;
        // Uncomment this next line and the Y/N problem fixes
        //printf("Shadow %p\n", &shadow_framebuffer[0]);
        printf("Start? %s\n", (idx < candidates_len[current] && candidates_len[next] < CANDIDATE_COUNT)?"Y":"N");
        for(; idx < candidates_len[current] && candidates_len[next] < CANDIDATE_COUNT; idx++) {
            printf("f??\n");
            Candidate check = candidates_current[idx];
            Candidate neighbors[4] = {
                {check.x, (check.y+1)%DISPLAY_HEIGHT},
                {(check.x+1)%DISPLAY_WIDTH, check.y},
                {check.x, (check.y+DISPLAY_HEIGHT-1)%DISPLAY_HEIGHT},
                {(check.x+DISPLAY_WIDTH-1)%DISPLAY_WIDTH, check.y}
            };
            for(int n = 0; n < 4; n++) {
                printf("idx %d n %d\n", idx, n);
                if (candidates_len[next] >= CANDIDATE_COUNT)
                    break;
                size_t neighbor_at = AT(neighbors[n].x, neighbors[n].y);
                // Use the shadow framebuffer to check for pixels we've checked this frame
                size_t neighbor_shadow_idx = neighbor_at/8;
                uint8_t neighbor_shadow_bit = 1<<(neighbor_at%8); 
                if (shadow_framebuffer[neighbor_shadow_idx] & neighbor_shadow_bit)
                    continue;
                shadow_framebuffer[neighbor_shadow_idx] |= neighbor_shadow_bit;
                uint16_t color_minus_current = color-fb[neighbor_at];
                int16_t color_minus_current_diff = color_minus_current;
                if (color_minus_current_diff <= 0)
                    continue;
                printf("keep\n");
                CANDIDATE_PUSH(neighbors[n]);
            }
        }
        printf("DONE\n");

        candidates_len[current] = 0;
        current = next;
        color++; // Treat the 565 bit-packed color as a single integer. Normally, you don't want this.
    }

    return 0;
}

I tested this with the ORIGINAL c-linker and with my MODIFIED c-linker. With my modified c-linker it is a diamond shape that continually scrolls downward while changing colors. This is 100% the expected behavoir of the code. On the other hand with the original C linker, it prints:

--============= Liftoff! ===============--
-- Fungus --
Start? (null)
DONE
Start? (null)
DONE
Start? (null)
DONE
Start? (null)
DONE

and then the final two lines repeat forever. These two lines are impossible, take note of the code that prints them:

        // Uncomment this next line and the Y/N problem fixes
        //printf("Shadow %p\n", &shadow_framebuffer[0]);
        printf("Start? %s\n", (idx < candidates_len[current] && candidates_len[next] < CANDIDATE_COUNT)?"Y":"N");
        for(; idx < candidates_len[current] && candidates_len[next] < CANDIDATE_COUNT; idx++) {
            printf("f??\n");

Start? should only be able to print "Y" or "N". It appears that memory and/or the actual program code are getting corrupted. If you uncomment the "Shadow" line as described you get a DIFFERENT "impossible" output, these lines repeated forever:

Shadow 0x10000024
Start? Y
DONE

"Start?" being "Y" means that the loop should begin at least once. But "f?" is never printed.

mcclure commented 9 months ago

Repeated "Case 2" test with "fixed" linker script. It DOES compile, but prints nothing after "Liftoff". Since unlike "case 3" it does not draw I don't know how to tell if it's frozen or just unable to printf.

agg23 commented 9 months ago

This may be fixed with the current linker file. The change was to leave the data segment as:

    .data :
    {
        . = ALIGN(8);
        _fdata = .;
        *(.data .data.* .gnu.linkonce.d.*)
        *(.data1)
        _gp = ALIGN(16);
        *(.sdata .sdata.* .gnu.linkonce.s.*)
        . = ALIGN(8);
        _edata = .;
    } > sram AT > main_ram

while all of the other segments are only sent to main_ram.

mcclure commented 9 months ago

In my testing it's fixed now