bdring / FluidNC

The next generation of motion control firmware
Other
1.61k stars 383 forks source link

Problem: FluidNC Crashing Randomly. #1308

Open StephenWillis2 opened 2 months ago

StephenWillis2 commented 2 months ago

Wiki Search Terms

FluidNC lock up. FluidNC crash. FluidNC panic. controller lock up.

Controller Board

MKS Tinybee 20240903_113438

Machine Description

A DIY basic 3axis router. Nothing special.

Input Circuits

No response

Configuration file

board: MKS TinyBee V1.0 XYYZ
name:

kinematics:
  Cartesian:

i2so:
  bck_pin: gpio.25
  data_pin: gpio.27
  ws_pin: gpio.26

spi:
  miso_pin: gpio.19
  mosi_pin: gpio.23
  sck_pin: gpio.18

sdcard:
  cs_pin: gpio.5
  # uses TH2 IO34 active low - MAKE SURE jumper J2 is set to SDDET!!! gpio.34:low
  card_detect_pin: NO_PIN
  frequency_hz: 400000

stepping:
  engine: I2S_STATIC
  idle_ms: 255
  pulse_us: 4
  dir_delay_us: 1
  disable_delay_us: 2

axes:
  x:
    steps_per_mm: 400
    max_rate_mm_per_min: 1000.000
    acceleration_mm_per_sec2: 80.000
    max_travel_mm: 2500.000
    soft_limits: false
    homing:
      cycle: 2
      positive_direction: true
      mpos_mm: 0.000
      feed_mm_per_min: 300.000
      seek_mm_per_min: 1500.000
      settle_ms: 500
      seek_scaler: 1.100
      feed_scaler: 1.100

    motor0:
      limit_neg_pin: gpio.33:low:pu
      hard_limits: true
      pulloff_mm: 4.000
      stepstick:
        step_pin: I2SO.1
        direction_pin: I2SO.2
        disable_pin: I2SO.0

  y:
    steps_per_mm: 400
    max_rate_mm_per_min: 1000.000
    acceleration_mm_per_sec2: 70.000
    max_travel_mm: 1250.000
    soft_limits: false
    homing:
      cycle: 3
      positive_direction: true
      mpos_mm: 0.000
      feed_mm_per_min: 300.000
      seek_mm_per_min: 2000.000
      settle_ms: 500
      seek_scaler: 1.100
      feed_scaler: 1.100

    motor0:
      limit_neg_pin: gpio.32:low:pu
      hard_limits: false
      pulloff_mm: 4.000
      stepstick:
        step_pin: I2SO.4
        direction_pin: I2SO.5
        disable_pin: I2SO.3

    # use E1 driver for 2nd Y axis motor
    motor1:
      limit_neg_pin: NO_PIN
      hard_limits: false
      pulloff_mm: 4.000
      stepstick:
        step_pin: I2SO.13
        direction_pin: I2SO.14
        disable_pin: I2SO.12

  z:
    steps_per_mm: 2600.000
    max_rate_mm_per_min: 200.000
    acceleration_mm_per_sec2: 5.000
    max_travel_mm: 80.000
    soft_limits: false
    homing:
      cycle: 1
      positive_direction: true
      mpos_mm: 0.000
      feed_mm_per_min: 300.000
      seek_mm_per_min: 500.000
      settle_ms: 500
      seek_scaler: 1.100
      feed_scaler: 1.100

    motor0:
      limit_pos_pin: gpio.22:low:pu
      hard_limits: true
      pulloff_mm: 3.000
      stepstick:
        step_pin: I2SO.7
        direction_pin: I2SO.8
        disable_pin: I2SO.6

control:
  safety_door_pin: NO_PIN
  # on MT_DET connector
  reset_pin: NO_PIN
  # on TH1 connector
  feed_hold_pin: gpio.36:low
  # on TB connector
  cycle_start_pin: gpio.39:low
  macro0_pin: NO_PIN
  macro1_pin: NO_PIN
  macro2_pin: NO_PIN
  macro3_pin: NO_PIN

coolant:
  # Heated Bed Terminal Block
  flood_pin: i2so.16
  # HE0 Terminal Block
  mist_pin: i2so.17
  delay_ms: 0

relay:
  output_pin: i2so.19
  #off_on_alarm: true

#probe:
  #pin: gpio.35:low
  #check_mode_start: true

start:
  must_home: false

Startup Messages

$ss
[MSG:INFO: FluidNC v3.8.0 https://github.com/bdring/FluidNC]
[MSG:INFO: Compiled with ESP32 SDK:v4.4.7-dirty]
[MSG:INFO: Local filesystem type is littlefs]
[MSG:INFO: Configuration file:config.yaml]
[MSG:INFO: Machine ]
[MSG:INFO: Board MKS TinyBee V1.0 XYYZ]
[MSG:INFO: I2SO BCK:gpio.25 WS:gpio.26 DATA:gpio.27]
[MSG:INFO: SPI SCK:gpio.18 MOSI:gpio.23 MISO:gpio.19]
[MSG:INFO: SD Card cs_pin:gpio.5 detect:NO_PIN freq:400000]
[MSG:INFO: Stepping:I2S_static Pulse:4us Dsbl Delay:2us Dir Delay:1us Idle Delay:255ms]
[MSG:INFO: Axis count 3]
[MSG:INFO: Axis X (-2500.000,0.000)]
[MSG:INFO:   Motor0]
[MSG:INFO:     stepstick Step:I2SO.1 Dir:I2SO.2 Disable:I2SO.0]
[MSG:INFO:  X Neg Limit gpio.33:low:pu]
[MSG:INFO: Axis Y (-1250.000,0.000)]
[MSG:INFO:   Motor0]
[MSG:INFO:     stepstick Step:I2SO.4 Dir:I2SO.5 Disable:I2SO.3]
[MSG:INFO:  Y Neg Limit gpio.32:low:pu]
[MSG:INFO:   Motor1]
[MSG:INFO:     stepstick Step:I2SO.13 Dir:I2SO.14 Disable:I2SO.12]
[MSG:INFO: Axis Z (-80.000,0.000)]
[MSG:INFO:   Motor0]
[MSG:INFO:     stepstick Step:I2SO.7 Dir:I2SO.8 Disable:I2SO.6]
[MSG:INFO:  Z Pos Limit gpio.22:low:pu]
[MSG:INFO: feed_hold_pin gpio.36:low]
[MSG:INFO: cycle_start_pin gpio.39:low]
[MSG:INFO: Kinematic system: Cartesian]
[MSG:INFO: Relay Spindle Ena:NO_PIN Out:I2SO.19 Dir:NO_PIN]
[MSG:INFO: Using spindle Relay]
[MSG:INFO: Flood coolant I2SO.16]
[MSG:INFO: Mist coolant I2SO.17]
[MSG:INFO: Connecting to STA SSID:HomeWIFI]
[MSG:INFO: Connecting.]
[MSG:INFO: Connecting..]
[MSG:INFO: Connected - IP is 192.168.1.113]
[MSG:INFO: WiFi on]
[MSG:INFO: Start mDNS with hostname:http://fluidnc.local/]
[MSG:INFO: SSDP Started]
[MSG:INFO: HTTP started on port 80]
[MSG:INFO: Telnet started on port 23]
ok

User Interface Software

WebUI and Serial/BT Versions

What happened?

When I run a job that is especially long like parallel passes, FluidNC crashes randomly and then reboots into a safe mode. I have had this happen about 7 times all in different places in the g-code while running parallel-pass tests.

GCode File

No response

Other Information

FluidTerms crash dump:

<Run|MPos:-183.342,-253.610,-24.922|FS:136,9000|SD:38.61,/sd/parallel.nc> <Run|MPos:-183.822,-253.610,-24.691|FS:533,9000|Ov:100,100,100|A:S|SD:38.61,/sd/parallel.nc> <Run|MPos:-185.268,-253.610,-24.513|FS:154,9000|WCO:0.000,0.000,-20.000|SD:38.61,/sd/parallel.nc> Guru Meditation Error: Core 1 panic'ed (Cache disabled but cached memory region accessed).

Core 1 register dump: PC : 0x4008abe2 PS : 0x00060035 A0 : 0x80081bad A1 : 0x3ffbf83c A2 : 0x3ffb58e8 A3 : 0x00000001 A4 : 0x00060023 A5 : 0x80000000 A6 : 0x00000000 A7 : 0x003fffff A8 : 0xbad00bad A9 : 0x3ffb1f30 A10 : 0x3ffb5958 A11 : 0x3ffb7720 A12 : 0x00000020 A13 : 0x80000000 A14 : 0x00000000 A15 : 0x3ffb1ff8 SAR : 0x00000002 EXCCAUSE: 0x00000007 EXCVADDR: 0x00000000 LBEG : 0x00000000 LEND : 0x00000000 LCOUNT : 0x00000000

Backtrace: 0x4008abdf:0x3ffbf83c |<-CORRUPTED

ELF file SHA256: e6173bfb5d10076b

Rebooting... ets Jun 8 2016 00:22:57

rst:0x3 (SW_RESET),boot:0x1b (SPI_FAST_FLASH_BOOT) configsip: 0, SPIWP:0xee clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00 mode:DIO, clock div:1 load:0x3fff0030,len:1184 load:0x40078000,len:13260 load:0x40080400,len:3028 entry 0x400805e4 [MSG:INFO: uart_channel0 created] [MSG:RST] [MSG:INFO: FluidNC v3.8.0 https://github.com/bdring/FluidNC] [MSG:INFO: Compiled with ESP32 SDK:v4.4.7-dirty] [MSG:INFO: Local filesystem type is littlefs] [MSG:ERR: Skipping configuration file due to panic] [MSG:INFO: Using default configuration] [MSG:INFO: Axes: using defaults] [MSG:INFO: Machine Default (Test Drive)] [MSG:INFO: Board None] [MSG:INFO: Stepping:RMT Pulse:4us Dsbl Delay:0us Dir Delay:0us Idle Delay:255ms] [MSG:INFO: Axis count 3] [MSG:INFO: Axis X (-1000.000,0.000)] [MSG:INFO: Motor0] [MSG:INFO: Axis Y (-1000.000,0.000)] [MSG:INFO: Motor0] [MSG:INFO: Axis Z (-1000.000,0.000)] [MSG:INFO: Motor0] [MSG:INFO: Kinematic system: Cartesian] [MSG:INFO: Connecting to STA SSID:HomeWIFI] [MSG:INFO: Connecting.] [MSG:INFO: Connecting..] [MSG:INFO: Connected - IP is 192.168.1.113] [MSG:INFO: WiFi on] [MSG:INFO: Start mDNS with hostname:http://fluidnc.local/] [MSG:INFO: SSDP Started] [MSG:INFO: HTTP started on port 80] [MSG:INFO: Telnet started on port 23]

Grbl 3.8 [FluidNC v3.8.0 (wifi) '$' for help] [MSG:ERR: Configuration is invalid. Check boot messages for ERR's.]

MitchBradley commented 2 months ago

See also #1322. In both cases, the crash is occurring during an interrupt handler. In this case, it is happening in Machine::Motor::step() at the line _driver->step(). In 1322 it is Machine::Axes::unstep() at the line m->_driver->unstep();. In both cases it is a Cache disabled error, with the target address of a call instruction in A8 whose value is 0xbad00bad. We expect the cache to be disabled during interrupt handlers, but we do not expect an attempt to access FLASH then. The question is "how did the value 0xbad00bad get in register A8? That address is supposed to be fetched from the _driver structure. Perhaps memory was overwritten somehow.