bdring / FluidNC

The next generation of motion control firmware
Other
1.6k stars 383 forks source link

Mid-program crash - "Guru Meditation Error" #472

Closed daxliniere closed 2 years ago

daxliniere commented 2 years ago

Controller Board

DLC32

Help From Board Vendor

Machine Description

Mill

Configuration file

board: MKS-DLC32 V2.1
name: SWOLE-CNC
meta: (23.04.2022) Dax Liniere, originally by Skorpi

kinematics:
  Cartesian:

stepping:
  engine: I2S_STATIC
  idle_ms: 0
  pulse_us: 4
  dir_delay_us: 1
  disable_delay_us: 0

axes:
  shared_stepper_disable_pin: I2SO.0
  x:
    steps_per_mm: 802.341
    max_rate_mm_per_min: 1500.000
# was 18000.000
    acceleration_mm_per_sec2: 80.000
# was 1500.000
    max_travel_mm: 301.000
    soft_limits: false
    homing:
      cycle: 2
      positive_direction: false
      mpos_mm: 0.000
      feed_mm_per_min: 20.000
      seek_mm_per_min: 200.000
# was 100 in first SWOLE-CNC iteration running MKS firmware
      settle_ms: 500
      seek_scaler: 1.100
      feed_scaler: 1.100

    motor0:
      limit_neg_pin: NO_PIN
      limit_pos_pin: NO_PIN
      limit_all_pin: gpio.36:high
      hard_limits: true
      pulloff_mm: 0.500
      stepstick:
        step_pin: I2SO.1
        direction_pin: I2SO.2:low
        disable_pin: NO_PIN
        ms1_pin: NO_PIN
        ms2_pin: NO_PIN
        ms3_pin: NO_PIN
        reset_pin: NO_PIN

  y:
    steps_per_mm: 802.322
    max_rate_mm_per_min: 1500.000
# was 18000.000
    acceleration_mm_per_sec2: 80.000
# was 300.000
    max_travel_mm: 187.000
    soft_limits: false
    homing:
      cycle: 2
      positive_direction: false
      mpos_mm: 0.000
      feed_mm_per_min: 20.000
# was 300.000
      seek_mm_per_min: 200.000
# was 5000.000 (was 100 with MKS firmware)
      settle_ms: 500
      seek_scaler: 1.100
      feed_scaler: 1.100

    motor0:
      limit_neg_pin: NO_PIN
      limit_pos_pin: NO_PIN
      limit_all_pin: gpio.35:high
      hard_limits: true
      pulloff_mm: 0.500
      stepstick:
        step_pin: I2SO.5
        direction_pin: I2SO.6:low
        disable_pin: NO_PIN
        ms1_pin: NO_PIN
        ms2_pin: NO_PIN
        ms3_pin: NO_PIN
        reset_pin: NO_PIN
  z:
    steps_per_mm: 802.497
    max_rate_mm_per_min: 1000.000
# was 18000.000
    acceleration_mm_per_sec2: 50.000
# was 500.000
    max_travel_mm: 46.000
    soft_limits: false
    homing:
      cycle: 1
      positive_direction: true
      mpos_mm: 0.000
      feed_mm_per_min: 20.000
# was 300.000
      seek_mm_per_min: 100.000
# was 1000.000
      settle_ms: 500
      seek_scaler: 1.100
      feed_scaler: 1.100

    motor0:
      limit_neg_pin: gpio.34:high
      hard_limits: true
      pulloff_mm: 0.500
      stepstick:
        step_pin: I2SO.3
        direction_pin: I2SO.4:low
        disable_pin: NO_PIN
        ms1_pin: NO_PIN
        ms2_pin: NO_PIN
        ms3_pin: NO_PIN
        reset_pin: NO_PIN

i2so:
  bck_pin: gpio.16
  data_pin: gpio.21
  ws_pin: gpio.17

spi:
  miso_pin: gpio.12
  mosi_pin: gpio.13
  sck_pin: gpio.14

sdcard:
  cs_pin: gpio.15
  card_detect_pin: NO_PIN
# This could be GPIO.39, but Card Detect has no supported functions in FluidNC

control:
  safety_door_pin: gpio.33:low:pu
  cycle_start_pin: NO_PIN
  feed_hold_pin: NO_PIN
  reset_pin: NO_PIN
  macro0_pin: NO_PIN
  macro1_pin: NO_PIN
  macro2_pin: NO_PIN
  macro3_pin: NO_PIN

macros:
  startup_line0:
  startup_line1:
  macro0: $SD/Run=lasertest.gcode
  macro1:
  macro2:
  macro3:

coolant:
  flood_pin: gpio.0
# continuous air
  mist_pin:  gpio.4
# pulsed air 
  delay_ms: 0

probe:
  pin: gpio.22:low
  check_mode_start: true

10V:
# Spindle
  direction_pin: NO_PIN
  forward_pin: gpio.5:low
  reverse_pin: NO_PIN
  output_pin: gpio.32
  enable_pin: NO_PIN
  pwm_hz: 5000
  disable_with_s0: false
  s0_with_disable: true
  spinup_ms: 10000
  spindown_ms: 20000
  tool_num: 0
  speed_map: 0=0% 0=25% 5868=25% 23237=99% 24000=100%
#DLC32 Spindle TTL max output voltage is 4.89v, so it can never reach 24000rpm/400Hz

#pwm:
#  direction_pin: NO_PIN
#  output_pin: gpio.32
#  enable_pin: gpio.5:low
#  pwm_hz: 5000
#  disable_with_s0: false
#  s0_with_disable: true
#  spinup_ms: 10000
#  spindown_ms: 20000
#  tool_num: 0
#  speed_map: 0=0% 0=25% 5868=25% 23237=99% 24000=100%

user_outputs:
  analog0_pin: NO_PIN
  analog1_pin: NO_PIN
  analog2_pin: NO_PIN
  analog3_pin: NO_PIN
  analog0_hz: 5000
  analog1_hz: 5000
  analog2_hz: 5000
  analog3_hz: 5000
  digital0_pin: NO_PIN
  digital1_pin: NO_PIN
  digital2_pin: NO_PIN
  digital3_pin: NO_PIN

start:
  must_home: false
# Pins that could be used: 0(SDA), 4(SCL), 5, 18, 19, 22, 23, 25, 26, 27, 32 (TTL spindle control), 33, 39, I2SO.7

Startup Messages

[MSG:INFO: FluidNC v3.4.6]
[MSG:INFO: Compiled with ESP32 SDK:v4.4.1-1-gb8050b365e]
[MSG:INFO: Local filesystem type is SPIFFS]
[MSG:ERR: Skipping configuration file due to panic]
[MSG:INFO: Using default configuration]
[MSG:INFO: Axes: using defaults]
[MSG:INFO: Machine Default (Test Drive)]
[MSG:INFO: Board None]
[MSG:INFO: SPI not defined]
[MSG:INFO: Stepping:RMT Pulse:4us Dsbl Delay:0us Dir Delay:0us Idle Delay:255ms]
[MSG:INFO: Axis count 3]
[MSG:INFO: Axis X (-1000.000,0.000)]
[MSG:INFO: Axis Y (-1000.000,0.000)]
[MSG:INFO: Axis Z (-1000.000,0.000)]
[MSG:INFO: Kinematic system: Cartesian]
[MSG:INFO: Using spindle NoSpindle]
[MSG:INFO: Connecting to STA SSID:Eggplant]
[MSG:INFO: Connecting.]
[MSG:INFO: Connected - IP is 192.168.2.81]
[MSG:INFO: WiFi on]
[MSG:INFO: Start mDNS with hostname:http://swolecnc.local/]
[MSG:INFO: SSDP Started]
[MSG:INFO: HTTP started on port 80]
[MSG:INFO: Telnet started on port 23]

User Interface Software

UGS 2.0.11

What happened?

Program was running, got a "Guru Meditation Error"

**>>> X52.193Y17.983 ok

X52.472Y17.596 ok X52.833Y17.24 ok X53.232Y16.975 ok X53.745Y16.779 [Error] An error was detected while sending 'Z19.25': Guru Meditation Error: Core 1 panic'ed (Cache disabled but cached memory region accessed). . Streaming has been paused. Core 1 register dump: PC : 0x401a8010 PS : 0x00060035 A0 : 0x800823fa A1 : 0x3ffbf19c
A2 : 0x00000000 A3 : 0x00000100 A4 : 0x3ffb1fb0 A5 : 0x3ffb1f30
A6 : 0x3ffc0428 A7 : 0x3ffb2d78 A8 : 0x800815b4 A9 : 0x00000080
A10 : 0x3ffc3060 A11 : 0x00000000 A12 : 0x3ffb1f30 A13 : 0x3ffb1f10
A14 : 0x3ffc0448 A15 : 0x3ffb1f4c SAR : 0x00000020 EXCCAUSE: 0x00000007
EXCVADDR: 0x00000000 LBEG : 0x4008aecc LEND : 0x4008aed7 LCOUNT : 0x00000000
Backtrace:0x401a800d:0x3ffbf19c |<-CORRUPTED ELF file SHA256: 0000000000000000 Rebooting... ets Jun 8 2016 00:22:57 rst:0x3 (SW_RESET),boot:0x1b (SPI_FAST_FLASH_BOOT) configsip: 0, SPIWP:0xee clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00 mode:DIO, clock div:1 load:0x3fff0030,len:1184 load:0x40078000,len:12812 load:0x40080400,len:3032 entry 0x400805e4 [MSG:INFO: FluidNC v3.4.6] [MSG:INFO: Compiled with ESP32 SDK:v4.4.1-1-gb8050b365e] [MSG:INFO: Local filesystem type is SPIFFS] [MSG:ERR: Skipping configuration file due to panic] [MSG:INFO: Using default configuration] [MSG:INFO: Axes: using defaults] [MSG:INFO: Machine Default (Test Drive)] [MSG:INFO: Board None] [MSG:INFO: SPI not defined] [MSG:INFO: Stepping:RMT Pulse:4us Dsbl Delay:0us Dir Delay:0us Idle Delay:255ms] [MSG:INFO: Axis count 3] [MSG:INFO: Axis X (-1000.000,0.000)] [MSG:INFO: Axis Y (-1000.000,0.000)] [MSG:INFO: Axis Z (-1000.000,0.000)] [MSG:INFO: Kinematic system: Cartesian] [MSG:INFO: Using spindle NoSpindle] [MSG:INFO: Connecting to STA SSID:Eggplant] [MSG:INFO: Connecting.] [MSG:INFO: Connected - IP is 192.168.2.81] [MSG:INFO: WiFi on] [MSG:INFO: Start mDNS with hostname:http://swolecnc.local/] [MSG:INFO: SSDP Started] [MSG:INFO: HTTP started on port 80] [MSG:INFO: Telnet started on port 23]

GRBL was reset. Canceling file transfer.

Grbl 3.4 [FluidNC v3.4.6 (wifi) '$' for help]

$$ [MSG:INFO: '$H'|'$X' to unlock] $10 = 1 (Status report options, mask) ok $G [GC:G0 G54 G17 G21 G90 G94 M5 M9 T0 F0 S0] ok**

Other Information

No response

daxliniere commented 2 years ago

Thanks Mitch. Just got this error again mid-program. Will update to 3.4.7 as soon as it's available.

daxliniere commented 2 years ago

Not sure if it's relevant, but it seems that after this occurs my flood coolant gets stuck in the on state and neither sending M9 nor pressing coolant override buttons helps. The controller needs a full power cycle.

MitchBradley commented 2 years ago

After the crash, it reboots in "safe mode" as evidenced by this message:

[MSG:ERR: Skipping configuration file due to panic]

Since it skips the config file, it has no idea what pins are used for coolant.

Arguably, this is not ideal, but neither is the possibility of a reboot loop that could happen if it read the same bad config file over and over.

daxliniere commented 2 years ago

Arguably, this is not ideal

I wouldn't argue with that, it makes perfect sense to handle it that way. Thanks for the information, Mitch.

By the way, I rolled back to the previous release for now. No gurus on mountain tops so far. ;)

MitchBradley commented 2 years ago

Is this problem resolved with the latest version?

daxliniere commented 2 years ago

I'll have time to update and check that either tomorrow or day after, then I'll report back.

daxliniere commented 2 years ago

Is this problem resolved with the latest version?

Nope :( Still present in 3.4.9 WiFi.

[Error] An error was detected while sending 'Y0.894': Guru Meditation Error: Core  1 panic'ed (Cache disabled but cached memory region accessed). . Streaming has been paused.
>>> G17G3X40.923Y130.608I0.2J0
Core  1 register dump:
PC      : 0x40081837  PS      : 0x00060035  A0      : 0x800823e2  A1      : 0x3ffbf1fc  
A2      : 0x00000000  A3      : 0x3ffb5084  A4      : 0x3ffc30f4  A5      : 0x00000003  
A6      : 0x00000003  A7      : 0x00000004  A8      : 0xbad00bad  A9      : 0x3ffbf1dc  
A10     : 0x3ffb51a8  A11     : 0x00019200  A12     : 0xfffffff7  A13     : 0x3ffb1e80  
A14     : 0x00000020  A15     : 0x84000244  SAR     : 0x0000001f  EXCCAUSE: 0x00000007  
EXCVADDR: 0x00000000  LBEG    : 0x4008aef0  LEND    : 0x4008aefb  LCOUNT  : 0x00000000  
Backtrace:0x40081834:0x3ffbf1fc |<-CORRUPTED
ELF file SHA256: 0000000000000000
Rebooting...
MitchBradley commented 2 years ago

I think the problem is caused by a gnarly trick I had to do to make C++ virtual methods work from interrupt service routines. The ESP32 architecture does not like to access data from FLASH when running an ISR. C++ virtual methods allocate some hidden data structures in FLASH, which causes sporadic failures. To fix it, I had to modify the linker scripts to relocate that hidden data into DRAM. Unfortunately, the new toolchain uses a different directory structure for the linker scripts, so my cunning workaround is being ignored.

daxliniere commented 2 years ago

Ahh dang. Well, good try, Mitch. Good luck finding a workaround.

daxliniere commented 2 years ago

Hey, is there any chance of getting a new build without the ISR changes that happened in 3.4.5?

MitchBradley commented 2 years ago

Not from me.

MitchBradley commented 2 years ago

I think I figured out how to apply the vtable workaround in the new toolchain environment. See PR #472

daxliniere commented 2 years ago

Unfortunately not solved with 3.5.0 pre1

MitchBradley commented 2 years ago

Try v3.5.0-pre3

MitchBradley commented 2 years ago

Is the crash gone?

daxliniere commented 2 years ago

I think you might have nailed it with 3.5.0pre5, Mitch! I've run 4 or 5 programs (variations on the same program) with lots of complex paths (Fusion's 2D Adaptive Clearing) and not one crash so far. I will keep you posted, of course, but this feels pretty solid to me.

Well done! Seems like it was a really tricky one.