MarlinFirmware / Marlin

Marlin is an optimized firmware for RepRap 3D printers based on the Arduino platform. Many commercial 3D printers come with Marlin installed. Check with your vendor if you need source code for your specific machine.
https://marlinfw.org
GNU General Public License v3.0
16.18k stars 19.22k forks source link

[BUG] I2S expander latency causes sync issues with non-expander hardware #24096

Closed HoverClub closed 2 years ago

HoverClub commented 2 years ago

Did you test the latest bugfix-2.0.x code?

Yes, and the problem still exists.

Bug Description

The I2S expander implementation has a latency of 32mSec max. (time from changing a stepper signal level to it appearing at the stepper driver pin. This causes problems for non-expander hardware that needs to be accurately sync'ed to stepper movement (eg. lasers, etc.).

Bug Timeline

New

Expected behavior

Laser should be sync'ed to stepper moves.

Actual behavior

Laser tuns on early (before cut move has started) and turns off early before cut has completed. see this simple Z pattern See https://github.com/MarlinFirmware/Marlin/pull/22690#issuecomment-1107895830

Steps to Reproduce

Use hardware that has an I2S expander (pins_ file has #define I2S_STEPPER_STREAM).)
Run the Gcode from here https://github.com/MarlinFirmware/Marlin/pull/22690#issuecomment-1107895830

Version of Marlin Firmware

2.0.9.2

Printer model

MarkForged Two

Electronics

MKS TinyBee

Add-ons

Laser

Bed Leveling

No response

Your Slicer

No response

Host Software

No response

Additional information & file uploads

I have a fix for this bug but would like some comment (maybe from @simon-jouet the author of the current code?) before adding and testing in Marlin. Working testbench code is attached.

Only issue I've got so far is the non-definition of the FIFO write register in the ESP32 sdk.

/*                        I2S Low-latency Expander
 *                        ------------------------
 *
 * Fixes issues for hardware that needs to be synchronised to stepper movement (laser, 
 * spindle, etc.)
 *
 *
 * NOTES:
 *
 * I2S update to remove the DMA buffers and reduce the latency to one I2S word time.  Data 
 * is now writeen directly into the Tx FIFO as and when required.  The hardware resends the 
 * last data continuously if the FIFO becomes empty.  Only a simple i2s_init function is now
 * required, a new expander data word is written directly to the fifo register greatly 
 * simplifying the code.
 *
 * 
 * A 32bit word, written into the FIFO is sent as two 16bit words one for each channel (left/write)
 * with MSB first.
 *
 * When the I2S FIFO is empty, the hardware re-sends the last 32 bit word packet continuously.
 * A new data value can be written into the FIFO directly. The FIFO will always be empty 
 * when data is written provided the write interrupt loop is > 4usecs (and, even if not
 * empty, the word will get queued anyway).  Latency is determined by bit rate which is limited
 * to 10MHz by the 74HC595 expander (10Mhz / 32bits = 250KHz).  in theory, the chip could be 
 * clk'ed at 20Mhz (datasheet @4.5v Vcc) but (unknown) system noise would limit it to less than 
 * this. 
 * 
 * This reduces the max. latency (time taken for the expander output to change state) 
 * of the stepper drive to one I2S WR clock period (4usecs) max.  Current I2S implementation 
 * with DMA and FIFO buffers is around 33mSecs max. which wodoesn't work when using GPIO 
 * controlled hardware that needs to be sync'ed to stepper movement (laser, etc.).  This code 
 * also saves 32K of memory by removing DMA buffers!
 * 
 * 
 * TODO:
 * 1. Replace I2S library with direct register code.
 * 
 * 2. Modify Marlin code to implement the FIFO write into the stepper ISR (no longer any 
 *    need for the StepperTask function).  Check if babystepping and linear advance can now be
 *    re-enabled.
 * 
 * 3. Make sure MINIMUM_STEP_PULSE is >4usecs when using I2S code.  In theory, the step data 
 *    can be updated faster and be queued in the FIFO but that would require a FIFO full check
 *    before any write (and what to do for 4uSecs if the FIFO is full?).  Queueing isn't a good
 *    idea anyway as it de-syncs the stepper pules from other hardware again (the initial issue!)
 * 
 * 4. check if increasing bit rate would work reliably with HC595 (reducing latency more) by
 *    checking ESP32 IO spec. with HC595?
 * 
 * 
 * ISSUES:
 * 1. FIFO_WR_REG isn't defined in the ESP32 SDK - it's defined in this code but probably worth
 *    checking there are no issues with Espressif?
 * 
 * 
*/

#include <Arduino.h>
#include "driver/i2s.h" // get rid of this!

// fifo_wr_reg is not defined in esp32 sdk so need to do it here!!!
// - it's documented in ESP32 tech ref manual so should be safe.
volatile unsigned int& I2S0_fifo_wr_reg = *((volatile unsigned int*)0x3FF4F000);

#define I2S_STEPPER_SPLIT_STREAM // for testing

inline static void gpio_matrix_out_check(uint32_t gpio, uint32_t signal_idx, bool out_inv, bool oen_inv)
{
    //if pin = -1, do not need to configure
    if (gpio != -1) {
        PIN_FUNC_SELECT(GPIO_PIN_MUX_REG[gpio], PIN_FUNC_GPIO);
        gpio_set_direction((gpio_num_t)gpio, (gpio_mode_t)GPIO_MODE_DEF_OUTPUT);
        gpio_matrix_out(gpio, signal_idx, out_inv, oen_inv);
    }
}

void setup() {
  Serial.begin(500000);

  static const i2s_config_t i2s_config = {
     .mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_TX),
     .sample_rate = 44100,
     .bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,
     .channel_format = I2S_CHANNEL_FMT_RIGHT_LEFT,  // dual channel data!
     .communication_format = (i2s_comm_format_t)(I2S_COMM_FORMAT_STAND_MSB), // I2S_COMM_FORMAT_STAND_I2S),
     .intr_alloc_flags = 0, // default interrupt priority
     .dma_buf_count = 2,        // must be >= 2 for lib
     .dma_buf_len = 8,          // >= 8 for lib
     .use_apll = false
  };

  // install and start i2s driver
  i2s_driver_install(I2S_NUM_0, &i2s_config, 0, NULL);
  // i2s_set_pin
  gpio_matrix_out_check(22, I2S0O_DATA_OUT23_IDX, 0, 0);
  gpio_matrix_out_check(25, I2S0O_WS_OUT_IDX, 0, 0);  // invert write so rising edge is after left channel data
  gpio_matrix_out_check(26, I2S0O_BCK_OUT_IDX, 0, 0);

  i2s_stop(I2S_NUM_0);

  // change some of the stuff setup by the i2s lib
  // specifically, disable DMA and IRQs (we don't need them!)

  // Stop FIFO DMA
  I2S0.out_link.stop = 1;

    I2S0.conf.tx_start = 0; // stop any XMT

  // Disconnect DMA from FIFO
  I2S0.fifo_conf.dscr_en = 0;  // clr this bit to disable I2S DMA mode. (R/W)

  // clock configuration
  // fi2s = fpll / (N + b/a)
  //   N >= 2  REG_CLKM_DIV_NUM
  //   b I2S_CLKM_DIV_B
  //   a I2S_CLKM_DIV_A

  // fbck = fi2s/M
  //   M >= 2 I2S_TX_BCK_DIV_NUM
  // bit rate = 10MHz, word rate = 256KHz (4uSec)
  // should be poss (with 74HC595) to operate at 20MHz/500KHz?
  I2S0.clkm_conf.clka_en = 0; // Use PLL/2 as reference
  I2S0.clkm_conf.clkm_div_num = 10; // min 2, max 255, reset value 4, 10 = 250KHz, 5 = 500Khz, etc.
  I2S0.clkm_conf.clkm_div_a = 0; // 0 at reset
  I2S0.clkm_conf.clkm_div_b = 0; // 0 at reset

    // fbck = fi2s / tx_bck_div_num
  I2S0.sample_rate_conf.tx_bck_div_num = 2; // minimum value of 2 defaults to 6

  // stop TX module
  I2S0.conf.tx_start = 0;

  // set MASTER mode
  I2S0.conf.tx_slave_mod = 0; // set master mode

  I2S0.fifo_conf.tx_fifo_mod = 0; // fifo mode 0 - 16bit dual channel
  I2S0.conf_chan.tx_chan_mod = 
  #ifdef I2S_STEPPER_SPLIT_STREAM
    4
  #else
    0
  #endif
  ;
  //I2S0.conf_chan.tx_chan_mod = 0; // dual channel mode
  I2S0.sample_rate_conf.tx_bits_mod = 16; // bits per sample

  I2S0.conf.tx_msb_right = 1;
  I2S0.conf.tx_right_first = 1;

  I2S0.conf_single_data = 0; // init all to zero on I2S

  // Use normal clock format, (WS is aligned with the last bit)
  I2S0.conf.tx_msb_shift = 0;
  I2S0.conf.rx_msb_shift = 0;

  // Disable TX interrupts
  I2S0.int_ena.out_eof = 0;
  I2S0.int_ena.val = 0; // no tinterrupts allowed

    I2S0.fifo_conf.tx_fifo_mod_force_en = 1;  // should always be set to 1 to enable Tx fifo.
    I2S0.conf1.tx_stop_en = 0;  // set to keep xmting clks when fifo empty
  I2S0_fifo_wr_reg = 0; // seed the FIFO - sets all expanders o/puts low 
  #ifdef I2S_STEPPER_SPLIT_STREAM
    // I2S0.conf_single_data = 0x0; // init data
    I2S0.conf_single_data = 0x44993388; // init some data for testing
  #endif

    I2S0.conf.tx_start = 1;         // start tx and loop last packet if fifo empty
}

uint32_t data = 0;

void loop() {
  I2S0_fifo_wr_reg = 0x5566aabb; 
  delayMicroseconds(9); //wait until fifo is empty again
  I2S0_fifo_wr_reg = 0xdeadbeef;  
  delayMicroseconds(9); //wait until fifo is empty again
}
simon-jouet commented 2 years ago

Hi HoverClub,

Thanks for the input on this, I haven't had a look at the I2S peripherals in a while so my memory is a bit fuzzy on the details.

So I think your suggestion might be correct and it does indeed solves the delay issue you are mentioning, but I also think this is a bit of a trade-off in term of implementation. An alternative could be to delay the pins triggering by the delay of the DMA buffers to keep things in sync (or have the laser triggered by one of the i2s pins so the delay is the same for steps and laser)

The idea behind using the I2S peripheral with the DMA buffer like this was to have an extremely precise (4us accuracy) stepping that can be scheduled with very few interrupts and be able to have a very high step rate (speed or microstepping) at no cost.

With this suggestion, the stepping would be tied to the StepperISR triggering and setting the register to toggle the pins. This works fine but requires quite a large number of interrupts, which can be quite jittery and can impact the WiFi operation in the ESP. Related to this is that for a while (I don't know if this is now resolve or not) the ESP32 did not support the FPU in interrupts, which required software arithmetic in the interrupt which isn't particularly fast (force to double as float are hardware accelerated but doubles aren't)

(IIRC) In the current mainline Marlin, the ESP HAL for i2s bitstream uses 2 interrupts. The first one is the interrupt triggered every time a DMA buffer is empty, and the second interrupt is for the TemperatureISR (as other HALs). The only thing the DMA buffer interrupt does is queue the DMA buffer to be populated by the stepperTask. Then the stepperTask is scheduled as a high priority task to populate the buffer. The advantage of this is that there is plenty of time for the buffer to be populated with the steps without an interrupt. So you get very high precision in timing while not being impacted by the jitter/delay/wifi.

In the ESP32-S2 code (on my github - warning it's a bit dirty) I went with a similar approach for the temperatureISR too. With the single core the wifi didn't like all the interrupts, so the temperature ISR just does a TaskNotify and a tasks performs the temperature logic, which has been working absolutely fine (except the esp32 ADC is garbage)

Anyway, I hope this gives a bit of context on why I decided to go with the DMA buffer way, I am not sure it's the best way to go about it, but above should provide some of my reasoning.

    1. FIFO_WR_REG isn't defined in the ESP32 SDK - it's defined in this code but probably worth
      • checking there are no issues with Espressif?

Might be worth checking in more details, but the address of the FIFO is at the base address of the I2S peripheral (in esp32.peripherals.ld PROVIDE ( I2S0 = 0x3ff4F000 );) so you could just write a uint16 to that.

HoverClub commented 2 years ago

Thanks for that background Simon - it helps!

I agree the timing scheme is a great improvement over the AVR int model but, unfortunately, unless ALL the hardware is controlled/synced using the expander "timer" (not possible as some, like a PWM laser, use on-chip peripherals) there will always be synchronisation issues with it. Other than reverting to the stepper ISR model I can't think of any solution.

One possibility that would reduce latency considerably would be to dispense with the DMA buffers and just use the I2S FIFO. That would reduce max. latency to 256usecs (64 x 4usecs) instead of the current 32msecs? You can also adjust the part-full raw irq threshold to reduce the effective FIFO size down from 64 (part full default is 32) if required (could be just one packet deep in theory and still preserve the 4usec timer). Underflow isn't an issue as it just resends the last packet if empty. It would also remove the DMA buffer space and the interrupt.

Something like this:

void stepperTask(void *parameter) {
  uint32_t remaining = 0;

  while (1) {
      while (!I2S[i2s_num]->int_raw.tx_wfull) { // space in fifo?
      // Fill with the port data post pulse_phase until the next step
      // make sure same data is sent until next stepper calc
      if (remaining) {
        I2S[i2s_num]->fifo_wr_reg = port_data;
        remaining--;
      }
      else {
        Stepper::pulse_phase_isr();             // port_data->I2S0_fifo_wr_reg
        remaining = Stepper::block_phase_isr(); // remaining # of stepper 4usec ISRs to wait before next step
      }
    }
    I2S[i2s_num]->int_clr.tx_wfull = 1; // clear the part-full interrupt bit
  }
}

DMA wouldn't be enabled at all. The only potential issue might be if stepperTask doesn't get enough cycles to keep some data in the FIFO (I'm not familiar with this aspect of RTOS) - worst case it'll resend the last packet one or more times each adding a 4us delay into the stream. Is that why the DMA buffers are large? Still got to be better than int's though!

And yeah - the ESP32 ADC is rubbish!

HoverClub commented 2 years ago

OK - DMA buffers look to be sized to cope with the, relatively slow, RTOS task switching (not sure how that can be determined accurately though?). The code suggestion above won't work as the FIFO buffer is way too small.

Implementing some kind of laser power buffer to save the power state at the same time as the expander value is added to the DMA buffer would suffer from the same sync problem - in that you can't determine where in the time stream the laser/spindle power should be set because the DMA buffer fill state will vary with Task switching, etc.

In short, I've no idea how to fix this issue other than reverting to the Marlin stepper ISR option (the 1st suggestion). Which, with an ESP32 probably isn't too bad as the relatively high speed CPU and high ISR priority should avoid excessive jitter.

HoverClub commented 2 years ago

After some experimenting and testing I've concluded that the original code is best! Anything else suffers from excessive jitter and stepper pulse update gaps.

I've implemented a ring buffer to save the laser/spindle power, ena/dir states at the point a port sample is pushed into the DMA buffer. The problem is then timing - i.e. at what later point do I activate the saved outputs to sync best with the I2S expander outputs (FIFO and output delay is variable). A fixed delay (the DMA buffer total length) doesn't work reliably because the task switching sometimes stops the push code for a fairly long time (and data is still being output by DMA/I2S).

The solution (I think) is to use the DMA read pointer as the laser buffer read pointer. That way, it remains sync'ed to the I2S output (+/- the I2S FIFO size * 4usecs - this can be reduced if I use the FIFO threshold bit to tell me whether the FIFO is half full or not!). I can't see how to access the DMA read pointer though - maybe @simon-jouet can help?

For reference, this is the test i2s_push_sample() code I'm using (which uses the DMA buffer length as the "delay":

void i2s_push_sample() {
  dma.current[dma.rw_pos++] = i2s_port_data;

  #if ANY(SPINDLE_FEATURE, LASER_FEATURE)
    #define LSR_ENA_BIT 15  // laser/spindle ena state (power level is in LSB)
    #define LSR_DIR_BIT 14  // spindle DIR status
    // ...other 6 bits could be used to sync any other hardware required!
    static const uint16_t LSR_BUF_LEN = (DMA_BUF_COUNT * DMA_SAMPLE_COUNT); 
    static uint16_t i2s_lsr_buffer[LSR_BUF_LEN] = {0};  //laser power buffer init
    static uint16_t i2s_lsr_rd_ptr = 0;
    static uint16_t i2s_lsr_last_status = 0;

    i2s_lsr_buffer[(i2s_lsr_rd_ptr - 1) % LSR_BUF_LEN] = cutter.power + (i2s_laser_ena << 8); // save current laser power, enable and direction bits
    uint16_t lsr_status = i2s_lsr_buffer[i2s_lsr_rd_ptr++]; // get laser power from one buffer ago
    i2s_lsr_rd_ptr %= LSR_BUF_LEN; // wrap ptr

    if (i2s_lsr_last_status != lsr_status) { // something changed?
      i2s_lsr_last_status = lsr_status;
      #if ENABLED(SPINDLE_CHANGE_DIR)
        WRITE(SPINDLE_DIR_PIN, (TEST(lsr_status, LSR_DIR_BIT) ? true : false))
      #endif
      #if EITHER(SPINDLE_FEATURE, LASER_FEATURE)
        WRITE(SPINDLE_LASER_ENA_PIN, (TEST(lsr_status, LSR_ENA_BIT) ? SPINDLE_LASER_ACTIVE_STATE : !SPINDLE_LASER_ACTIVE_STATE));
      #endif 
      #if ENABLED(SPINDLE_LASER_USE_PWM)
        cutter.set_ocr(lsr_status & 0xff, true); // set the power from LSB
      #endif
    }
  #endif
}
github-actions[bot] commented 2 years ago

This issue has had no activity in the last 60 days. Please add a reply if you want to keep this issue active, otherwise it will be automatically closed within 10 days.

thisiskeithb commented 2 years ago

https://github.com/MarlinFirmware/Marlin/pull/24193 has been merged

github-actions[bot] commented 2 years ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.