boochow / pico_test_projects

Some projects to test Raspberry Pi Pico unique functionalities, such as interpolators or scanvideo library.
MIT License
87 stars 7 forks source link

How to do USB stdio output in vga-test3 dma example? #2

Closed sandric closed 1 year ago

sandric commented 1 year ago

Hi, I wonder is it possible to somehow use usb printf (or anything more heavy than led-blinking to that matter?) in vga-test3 dma example? I tried using usb printf in vga-test3 simple vgaimage.c example, and all works fine, but with vgaimage-dma.c I'm unable to make it work. I tried to use second core in both ways - to print tick string on it with 1 second delay, and vice versa - to print tick on main core, and moving vga routine to second core:

#include <stdio.h>
#include "pico/stdlib.h"
#include "pico/multicore.h"
#include "pico.h"
#include "pico/scanvideo.h"
#include "pico/scanvideo/composable_scanline.h"
#include "hardware/structs/dma.h"
#include "hardware/structs/ssi.h"

#include "image-vga.h"
//#define VGA_MODE vga_mode_320x240_60
#define VGA_MODE vga_mode_640x480_60
#define MIN_RUN 3

// flash_bulk_read() is from pco-playground/scanvideo/flashstream

// Use direct SSI DMA for maximum transfer speed (but cannot execute from
// flash at the same time)
void __no_inline_not_in_flash_func(flash_bulk_read)(uint32_t *rxbuf, uint32_t flash_offs, size_t len, uint dma_chan) {
    ssi_hw->ssienr = 0;
    ssi_hw->ctrlr1 = len - 1; // NDF, number of data frames (32b each)
    ssi_hw->dmacr = SSI_DMACR_TDMAE_BITS | SSI_DMACR_RDMAE_BITS;
    ssi_hw->ssienr = 1;

    dma_hw->ch[dma_chan].read_addr = (uint32_t) &ssi_hw->dr0;
    dma_hw->ch[dma_chan].write_addr = (uint32_t) rxbuf;
    dma_hw->ch[dma_chan].transfer_count = len;
    // Must enable DMA byteswap because non-XIP 32-bit flash transfers are
    // big-endian on SSI (we added a hardware tweak to make XIP sensible)
    dma_hw->ch[dma_chan].ctrl_trig =
            DMA_CH0_CTRL_TRIG_BSWAP_BITS |
            DREQ_XIP_SSIRX << DMA_CH0_CTRL_TRIG_TREQ_SEL_LSB |
            dma_chan << DMA_CH0_CTRL_TRIG_CHAIN_TO_LSB |
            DMA_CH0_CTRL_TRIG_INCR_WRITE_BITS |
            DMA_CH0_CTRL_TRIG_DATA_SIZE_VALUE_SIZE_WORD << DMA_CH0_CTRL_TRIG_DATA_SIZE_LSB |
            DMA_CH0_CTRL_TRIG_EN_BITS;

    // Now DMA is waiting, kick off the SSI transfer (mode continuation bits in LSBs)
    ssi_hw->dr0 = (flash_offs << 8) | 0xa0;

    while (dma_hw->ch[dma_chan].ctrl_trig & DMA_CH0_CTRL_TRIG_BUSY_BITS)
        tight_loop_contents();

    ssi_hw->ssienr = 0;
    ssi_hw->ctrlr1 = 0;
    ssi_hw->dmacr = 0;
    ssi_hw->ssienr = 1;
}

// RAW_RUN |color 1| n-3 |..|color n| 0 | EOL
int num_token = 2 + image_width + 1 + 1;

// render pixels
int32_t __time_critical_func(single_scanline)(uint32_t *buf, size_t buf_length, const uint16_t *data) {
    assert(buf_length >= num_words);

    uint16_t *p16 = (uint16_t *) buf;

    flash_bulk_read(&buf[1], (uint32_t) data, image_width / 2, 11);
    p16[0] = COMPOSABLE_RAW_RUN;
    p16[1] = p16[2];
    //    p16[2] = (num_token - 3) - MIN_RUN;
    p16[2] = (image_width + 1) - MIN_RUN;
    buf[num_token / 2 - 1] = 0 | COMPOSABLE_EOL_ALIGN << 16;

    return (num_token + 1) / 2;
}

static void inline render_scanline(struct scanvideo_scanline_buffer *dest) {
    uint32_t *buf = dest->data;
    size_t buf_length = dest->data_max;
    int line_num = scanvideo_scanline_number(dest->scanline_id);

    dest->data_used = single_scanline(buf, buf_length, &image[line_num * image_width]);

    dest->status = SCANLINE_OK;
}

// main loop
void __time_critical_func(render_loop)() {
    static uint32_t last_frame_num = 0;

    while (true) {
        struct scanvideo_scanline_buffer *scanline_buffer = scanvideo_begin_scanline_generation(true);

        render_scanline(scanline_buffer);

        scanvideo_end_scanline_generation(scanline_buffer);
    }
}

void core1_entry() {
  while(true) {
    printf("tick\n");
    sleep_ms(1000);
  }
}

void render_core1() {
  scanvideo_setup(&VGA_MODE);
  scanvideo_timing_enable(true);
  render_loop();
}

Second core vga routine main:

int main(void) {
    set_sys_clock_khz(200000, true);
    stdio_init_all();

    multicore_launch_core1(render_core1);

    while(true) {
      printf("tick\n");
      sleep_ms(1000);
    }

    return 0;
}

Second core ticking main:

int main(void) {
    set_sys_clock_khz(200000, true);
    stdio_init_all();

    multicore_launch_core1(core1_entry);

    scanvideo_setup(&VGA_MODE);
    scanvideo_timing_enable(true);
    render_loop();

    return 0;
}

If I move vga routine to second core - it draws without a problem but port is not opened and just no communication appears. If I move ticking printing to second core - it didn't draw at all, and no port is opened - looks just halt.

I debugged a bit, and it looks like everything works fine up until __no_inline_not_in_flash_func(flash_bulk_read) function gets executed. I tried to remove everything but first line with ssi_hw->ssienr = 0; - and it immidiately breaks. I'm a newbie into pi and not quite sure what this code does.

I also tried to do the same with flash-stream dma example in similar pico-playground examle, and created issue in its repository - https://github.com/raspberrypi/pico-playground/issues/36

So my question is - do you tried to output usb printout and what am I doing wrong here? I want to draw 640x320 display while simultaneously checking capacitive touch panel every ~5ms via i2c, do you know if its possible at all and if all resources are taken by pico just to draw screen so it will halt? Thx

sandric commented 1 year ago

I also not sure how to use even uart though - since vga uses all pins with uart buses using gpio0-gpio17. Here's my schematic I took from pico documentation:

Screenshot 2023-01-01 at 18-46-29 Hardware design with RP2040 - hardware-design-with-rp2040 pdf

Can someone share schematic to use UART communication with vga 5-5-5 DAC?

sandric commented 1 year ago

I managed to use UART with 20 and 21 pins, but turns out that even smth like led blinking loop simply hangs, so I think it just unusable with rendering to do anything else.

Andy2No commented 1 year ago

@sandric Have you tried SoftwareSerial?

sandric commented 1 year ago

@Andy2No no, but as I said - 20 and 21 gpio works fine for UART, but when I start vga routine it doesn't work. So the thing is not in usb or uart - I can not make simple loop of blinking led on second core to work, it hangs, you can try this code commenting vga code in main in and out to see:

void core1_blink() {
  while(true) {
    gpio_put(LED_PIN, 1);
    sleep_ms(1000);
    gpio_put(LED_PIN, 0);
    sleep_ms(1000);
  }

int main() {
  ...
  multicore_launch_core1(core1_blink);
  ...
}
boochow commented 1 year ago

@sandric vgaimage-dma.c employs DMA (direct memory access), a dedicated hardware mechanism for transferring data between devices, such as RAM and flash memory. The image data is stored in the flash memory and needs to be transferred to RAM. The DMA hardware performs this task because using the CPU would require two data transfer sessions: reading to the CPU and writing to RAM. The issue is that the CPU requires access to flash memory where the code and data are located, but the flash memory is frequently occupied because the DMA hardware locks it. This is why printf() in your code does not work. printf() and other library code/data are located in the flash memory, and the CPU cannot access them. To solve this problem, you should place your entire code in RAM by specifying the following option:

cmake -DPICO_NO_FLASH=1 -B build

However, this does not work with my code because the image data is part of the code, and the linker will try to place it in RAM, but there is insufficient space. A possible solution is to write the image data directly to the flash memory and access it from the code using a pointer that points to the image data's address in the flash memory. Please refer to https://github.com/raspberrypi/pico-playground/tree/master/scanvideo/flash_stream as this code does exactly that.

sandric commented 1 year ago

@boochow Thanks a lot for great explanation, I also came to this conclusion by trying to read pico datasheet, but its above my expertise level unfortunately. I tried the same but setting this variable in cmake itself:

set(PICO_NO_FLASH 1)

but didn't end up resolving the issue. But thanks for a clue about flash_stream - I know how to build it too, bud didn't though about picotool as a way to get away with it. I will try it and write back.

sandric commented 1 year ago

@boochow ok, I tried, but no success still. I tried both starting blinking on secondary core, and starting vga routine on secondary core. Both times - smth different is happening when PICO_NO_FLASH=1 is used: display got splitted on half with "blue/black" colors on sides, and blinking/printf CDC routine actually working, but not vga though. I also found out that for some reason flashed code works only before restarting pico, and fallback to previously loaded without PICO_NO_FLASH=1, not sure but mb that mean that you need to flash it each time you power device if you want to use RAM loaded code?

I was playing also a bit with dma bulk loading, I found out experimentally that if I reduce buffer to read from ssi/dma to 32 bytes - it actually works, just renders like 1/20 of screen width.

I wonder if it is somehow means possible to reduce bulk loading times by reading picture only 1 time, instead of each time reading from RAM 60 times a second? The thing is that I need to display only 1 picture, not video etc. I mean mb there is timing for 1 hz in vga modes - so that rest of the second I could do anything else apart rendering? Or vga requires you to give 60hz all the times?

boochow commented 1 year ago

@sandric thank you for sharing this; it seems something still conflicts with the DMA... but I'm afraid I can't handle this issue with my knowledge. How about posting your question at https://forums.raspberrypi.com/viewforum.php?f=145 ? kilograham may answer it.

sandric commented 1 year ago

@boochow sure, didn't have big luck before with rp forum, but I'll try there, thanks for your help.