Frix-x / klippain-shaketune

Klipper streamlined input shaper workflow and calibration tools
GNU General Public License v3.0
642 stars 74 forks source link

MCU 'mcu' shutdown: Timer too close still present #124

Open RNGIllSkillz opened 3 months ago

RNGIllSkillz commented 3 months ago

K-Shake&Tune module branch

Version

v4.0.1-0-g69ad2283

Describe the bug and expected behavior

i belive problem appeared after first axis check

`` BatchBulkHelper batch callback error Traceback (most recent call last): File "/home/klipper/klipper/klippy/mcu.py", line 71, in _do_send return xh.get_response(cmds, self._cmd_queue, minclock, reqclock) File "/home/klipper/klipper/klippy/serialhdl.py", line 327, in get_response raise error("Unable to obtain '%s' response" % (self.name,)) serialhdl.error: Unable to obtain 'sensor_bulk_status' response

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/klipper/klipper/klippy/extras/bulk_sensor.py", line 72, in _proc_batch msg = self.batch_cb(eventtime) File "/home/klipper/klipper/klippy/extras/adxl345.py", line 294, in _process_batch samples = self.ffreader.pull_samples() File "/home/klipper/klipper/klippy/extras/bulk_sensor.py", line 270, in pull_samples self._update_clock() File "/home/klipper/klipper/klippy/extras/bulk_sensor.py", line 242, in _update_clock params = self.query_status_cmd.send([self.oid]) File "/home/klipper/klipper/klippy/mcu.py", line 75, in send return self._do_send([self._cmd.encode(data)], minclock, reqclock) File "/home/klipper/klipper/klippy/mcu.py", line 73, in _do_send raise self._error(str(e)) gcode.CommandError: Unable to obtain 'sensor_bulk_status' response BatchBulkHelper stop callback error Traceback (most recent call last): File "/home/klipper/klipper/klippy/mcu.py", line 71, in _do_send return xh.get_response(cmds, self._cmd_queue, minclock, reqclock) File "/home/klipper/klipper/klippy/serialhdl.py", line 327, in get_response raise error("Unable to obtain '%s' response" % (self.name,)) serialhdl.error: Unable to obtain 'sensor_bulk_status' response

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/klipper/klipper/klippy/extras/bulk_sensor.py", line 72, in _proc_batch msg = self.batch_cb(eventtime) File "/home/klipper/klipper/klippy/extras/adxl345.py", line 294, in _process_batch samples = self.ffreader.pull_samples() File "/home/klipper/klipper/klippy/extras/bulk_sensor.py", line 270, in pull_samples self._update_clock() File "/home/klipper/klipper/klippy/extras/bulk_sensor.py", line 242, in _update_clock params = self.query_status_cmd.send([self.oid]) File "/home/klipper/klipper/klippy/mcu.py", line 75, in send return self._do_send([self._cmd.encode(data)], minclock, reqclock) File "/home/klipper/klipper/klippy/mcu.py", line 73, in _do_send raise self._error(str(e)) gcode.CommandError: Unable to obtain 'sensor_bulk_status' response

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/klipper/klipper/klippy/mcu.py", line 71, in _do_send return xh.get_response(cmds, self._cmd_queue, minclock, reqclock) File "/home/klipper/klipper/klippy/serialhdl.py", line 327, in get_response raise error("Unable to obtain '%s' response" % (self.name,)) serialhdl.error: Unable to obtain 'spi_transfer_response' response

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/klipper/klipper/klippy/extras/bulk_sensor.py", line 62, in _stop self.stop_cb() File "/home/klipper/klipper/klippy/extras/adxl345.py", line 289, in _finish_measurements self.set_reg(REG_POWER_CTL, 0x00) File "/home/klipper/klipper/klippy/extras/adxl345.py", line 232, in set_reg stored_val = self.read_reg(reg) File "/home/klipper/klipper/klippy/extras/adxl345.py", line 227, in read_reg params = self.spi.spi_transfer([reg | REG_MOD_READ, 0x00]) File "/home/klipper/klipper/klippy/extras/bus.py", line 98, in spi_transfer return self.spi_transfer_cmd.send([self.oid, data], File "/home/klipper/klipper/klippy/mcu.py", line 75, in send return self._do_send([self._cmd.encode(data)], minclock, reqclock) File "/home/klipper/klipper/klippy/mcu.py", line 73, in _do_send raise self._error(str(e)) ``

after that i decided to run test again, but new error appeared. idk if thats related

`Internal error on command:"_COMPARE_BELTS_RESPONSES" Traceback (most recent call last): File "/home/klipper/klipper/klippy/gcode.py", line 211, in _process_commands handler(gcmd) File "/home/klipper/klipper/klippy/gcode.py", line 137, in func = lambda params: origfunc(self._get_extended_params(params)) File "/home/klipper/klipper/klippy/extras/shaketune/shaketune.py", line 148, in cmd_COMPARE_BELTS_RESPONSES compare_belts_responses(gcmd, self._pconfig, st_process) File "/home/klipper/klipper/klippy/extras/shaketune/commands/compare_belts_responses.py", line 105, in compare_belts_responses vibrate_axis(toolhead, gcode, config['direction'], min_freq, max_freq, hz_per_sec, accel_per_hz) File "/home/klipper/klipper/klippy/extras/shaketune/helpers/resonance_test.py", line 48, in vibrate_axis toolhead.move([X, Y, Z, E], max_v) File "/home/klipper/klipper/klippy/toolhead.py", line 473, in move self.lookahead.add_move(move) File "/home/klipper/klipper/klippy/toolhead.py", line 187, in add_move self.flush(lazy=True) File "/home/klipper/klipper/klippy/toolhead.py", line 176, in flush self.toolhead._process_moves(queue[:flush_count]) File "/home/klipper/klipper/klippy/toolhead.py", line 362, in _process_moves self._advance_move_time(next_move_time) File "/home/klipper/klipper/klippy/toolhead.py", line 320, in _advance_move_time self._advance_flush_time(flush_time) File "/home/klipper/klipper/klippy/toolhead.py", line 300, in _advance_flush_time sg(sg_flush_time) File "/home/klipper/klipper/klippy/stepper.py", line 228, in generate_steps raise error("Internal error in stepcompress") stepper.error: Internal error in stepcompress Internal error on command:"_COMPARE_BELTS_RESPONSES" Internal error on command:"COMPARE_BELTS_RESPONSES" Traceback (most recent call last): File "/home/klipper/klipper/klippy/gcode.py", line 211, in _process_commands handler(gcmd) File "/home/klipper/klipper/klippy/gcode.py", line 137, in func = lambda params: origfunc(self._get_extended_params(params)) File "/home/klipper/klipper/klippy/extras/gcode_macro.py", line 189, in cmd self.template.run_gcode_from_command(kwparams) File "/home/klipper/klipper/klippy/extras/gcode_macro.py", line 68, in run_gcode_from_command self.gcode.run_script_from_command(self.render(context)) File "/home/klipper/klipper/klippy/gcode.py", line 226, in run_script_from_command self._process_commands(script.split('\n'), need_ack=False) File "/home/klipper/klipper/klippy/gcode.py", line 211, in _process_commands handler(gcmd) File "/home/klipper/klipper/klippy/gcode.py", line 137, in func = lambda params: origfunc(self._get_extended_params(params)) File "/home/klipper/klipper/klippy/extras/shaketune/shaketune.py", line 148, in cmd_COMPARE_BELTS_RESPONSES compare_belts_responses(gcmd, self._pconfig, st_process) File "/home/klipper/klipper/klippy/extras/shaketune/commands/compare_belts_responses.py", line 105, in compare_belts_responses vibrate_axis(toolhead, gcode, config['direction'], min_freq, max_freq, hz_per_sec, accel_per_hz) File "/home/klipper/klipper/klippy/extras/shaketune/helpers/resonance_test.py", line 48, in vibrate_axis toolhead.move([X, Y, Z, E], max_v) File "/home/klipper/klipper/klippy/toolhead.py", line 473, in move self.lookahead.add_move(move) File "/home/klipper/klipper/klippy/toolhead.py", line 187, in add_move self.flush(lazy=True) File "/home/klipper/klipper/klippy/toolhead.py", line 176, in flush self.toolhead._process_moves(queue[:flush_count]) File "/home/klipper/klipper/klippy/toolhead.py", line 362, in _process_moves self._advance_move_time(next_move_time) File "/home/klipper/klipper/klippy/toolhead.py", line 320, in _advance_move_time self._advance_flush_time(flush_time) File "/home/klipper/klipper/klippy/toolhead.py", line 300, in _advance_flush_time sg(sg_flush_time) File "/home/klipper/klipper/klippy/stepper.py", line 228, in generate_steps raise error("Internal error in stepcompress") stepper.error: Internal error in stepcompress Internal error on command:"COMPARE_BELTS_RESPONSES" Internal Error on WebRequest: gcode/script Traceback (most recent call last): File "/home/klipper/klipper/klippy/webhooks.py", line 256, in _process_request func(web_request) File "/home/klipper/klipper/klippy/webhooks.py", line 436, in _handle_script self.gcode.run_script(web_request.get_str('script')) File "/home/klipper/klipper/klippy/gcode.py", line 229, in run_script self._process_commands(script.split('\n'), need_ack=False) File "/home/klipper/klipper/klippy/gcode.py", line 211, in _process_commands handler(gcmd) File "/home/klipper/klipper/klippy/gcode.py", line 137, in func = lambda params: origfunc(self._get_extended_params(params)) File "/home/klipper/klipper/klippy/extras/gcode_macro.py", line 189, in cmd self.template.run_gcode_from_command(kwparams) File "/home/klipper/klipper/klippy/extras/gcode_macro.py", line 68, in run_gcode_from_command self.gcode.run_script_from_command(self.render(context)) File "/home/klipper/klipper/klippy/gcode.py", line 226, in run_script_from_command self._process_commands(script.split('\n'), need_ack=False) File "/home/klipper/klipper/klippy/gcode.py", line 211, in _process_commands handler(gcmd) File "/home/klipper/klipper/klippy/gcode.py", line 137, in func = lambda params: origfunc(self._get_extended_params(params)) File "/home/klipper/klipper/klippy/extras/shaketune/shaketune.py", line 148, in cmd_COMPARE_BELTS_RESPONSES compare_belts_responses(gcmd, self._pconfig, st_process) File "/home/klipper/klipper/klippy/extras/shaketune/commands/compare_belts_responses.py", line 105, in compare_belts_responses vibrate_axis(toolhead, gcode, config['direction'], min_freq, max_freq, hz_per_sec, accel_per_hz) File "/home/klipper/klipper/klippy/extras/shaketune/helpers/resonance_test.py", line 48, in vibrate_axis toolhead.move([X, Y, Z, E], max_v) File "/home/klipper/klipper/klippy/toolhead.py", line 473, in move self.lookahead.add_move(move) File "/home/klipper/klipper/klippy/toolhead.py", line 187, in add_move self.flush(lazy=True) File "/home/klipper/klipper/klippy/toolhead.py", line 176, in flush self.toolhead._process_moves(queue[:flush_count]) File "/home/klipper/klipper/klippy/toolhead.py", line 362, in _process_moves self._advance_move_time(next_move_time) File "/home/klipper/klipper/klippy/toolhead.py", line 320, in _advance_move_time self._advance_flush_time(flush_time) File "/home/klipper/klipper/klippy/toolhead.py", line 300, in _advance_flush_time sg(sg_flush_time) File "/home/klipper/klipper/klippy/stepper.py", line 228, in generate_steps raise error("Internal error in stepcompress") stepper.error: Internal error in stepcompress Stats 96145.0: gcodein=0 mcu: mcu_awake=0.117 mcu_task_avg=0.000005 mcu_task_stddev=0.000003 bytes_write=155027 bytes_read=57210 bytes_retransmit=0 bytes_invalid=0 send_seq=4091 receive_seq=4091 retransmit_seq=0 srtt=0.002 rttvar=0.000 rto=0.025 ready_bytes=17 upcoming_bytes=0 freq=168003344 TH0: mcu_awake=0.773 mcu_task_avg=0.000028 mcu_task_stddev=0.000030 bytes_write=10084 bytes_read=549769 bytes_retransmit=0 bytes_invalid=52 send_seq=984 receive_seq=984 retransmit_seq=0 srtt=0.001 rttvar=0.001 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=12000117 adj=11999778 TH1: mcu_awake=0.002 mcu_task_avg=0.000012 mcu_task_stddev=0.000019 bytes_write=1863 bytes_read=12075 bytes_retransmit=0 bytes_invalid=0 send_seq=224 receive_seq=224 retransmit_seq=0 srtt=0.002 rttvar=0.001 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=12000165 adj=12000114 TH2: mcu_awake=0.002 mcu_task_avg=0.000011 mcu_task_stddev=0.000010 bytes_write=1887 bytes_read=12059 bytes_retransmit=0 bytes_invalid=0 send_seq=225 receive_seq=225 retransmit_seq=0 srtt=0.002 rttvar=0.001 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=64000435 adj=63998482 Arduino_MEGA: mcu_awake=0.068 mcu_task_avg=0.000216 mcu_task_stddev=0.000086 bytes_write=1815 bytes_read=50172 bytes_retransmit=9 bytes_invalid=6 send_seq=189 receive_seq=189 retransmit_seq=2 srtt=0.004 rttvar=0.001 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=15882018 adj=15881628 TH0_temp: temp=36.3 TH1_temp: temp=37.1 TH2_temp: temp=37.8 OrangePi_temp: temp=53.5 Driver_Z0: temp=42.4 Driver_Z1: temp=57.8 Driver_Z2: temp=63.4 Driver_Z3: temp=57.4 Driver_A: temp=56.5 Driver_B: temp=54.4 Motor_Z0: temp=42.9 Motor_Z1: temp=49.6 Motor_Z2: temp=38.6 Motor_Z3: temp=46.4 chamber: temp=31.2 heater_bed: target=60 temp=60.1 pwm=0.206 sysload=0.86 cputime=249.610 memavail=382288 print_time=73.119 buffer_time=2.310 print_stall=0 extruder: target=150 temp=149.5 pwm=0.316 extruder1: target=0 temp=30.2 pwm=0.000 extruder2: target=0 temp=28.9 pwm=0.000 BatchBulkHelper batch callback error Traceback (most recent call last): File "/home/klipper/klipper/klippy/mcu.py", line 71, in _do_send return xh.get_response(cmds, self._cmd_queue, minclock, reqclock) File "/home/klipper/klipper/klippy/serialhdl.py", line 327, in get_response raise error("Unable to obtain '%s' response" % (self.name,)) serialhdl.error: Unable to obtain 'sensor_bulk_status' response

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/klipper/klipper/klippy/extras/bulk_sensor.py", line 72, in _proc_batch msg = self.batch_cb(eventtime) File "/home/klipper/klipper/klippy/extras/adxl345.py", line 294, in _process_batch samples = self.ffreader.pull_samples() File "/home/klipper/klipper/klippy/extras/bulk_sensor.py", line 270, in pull_samples self._update_clock() File "/home/klipper/klipper/klippy/extras/bulk_sensor.py", line 242, in _update_clock params = self.query_status_cmd.send([self.oid]) File "/home/klipper/klipper/klippy/mcu.py", line 75, in send return self._do_send([self._cmd.encode(data)], minclock, reqclock) File "/home/klipper/klipper/klippy/mcu.py", line 73, in _do_send raise self._error(str(e)) gcode.CommandError: Unable to obtain 'sensor_bulk_status' response BatchBulkHelper stop callback error Traceback (most recent call last): File "/home/klipper/klipper/klippy/mcu.py", line 71, in _do_send return xh.get_response(cmds, self._cmd_queue, minclock, reqclock) File "/home/klipper/klipper/klippy/serialhdl.py", line 327, in get_response raise error("Unable to obtain '%s' response" % (self.name,)) serialhdl.error: Unable to obtain 'sensor_bulk_status' response

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/klipper/klipper/klippy/extras/bulk_sensor.py", line 72, in _proc_batch msg = self.batch_cb(eventtime) File "/home/klipper/klipper/klippy/extras/adxl345.py", line 294, in _process_batch samples = self.ffreader.pull_samples() File "/home/klipper/klipper/klippy/extras/bulk_sensor.py", line 270, in pull_samples self._update_clock() File "/home/klipper/klipper/klippy/extras/bulk_sensor.py", line 242, in _update_clock params = self.query_status_cmd.send([self.oid]) File "/home/klipper/klipper/klippy/mcu.py", line 75, in send return self._do_send([self._cmd.encode(data)], minclock, reqclock) File "/home/klipper/klipper/klippy/mcu.py", line 73, in _do_send raise self._error(str(e)) gcode.CommandError: Unable to obtain 'sensor_bulk_status' response

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/klipper/klipper/klippy/mcu.py", line 71, in _do_send return xh.get_response(cmds, self._cmd_queue, minclock, reqclock) File "/home/klipper/klipper/klippy/serialhdl.py", line 327, in get_response raise error("Unable to obtain '%s' response" % (self.name,)) serialhdl.error: Unable to obtain 'spi_transfer_response' response

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/klipper/klipper/klippy/extras/bulk_sensor.py", line 62, in _stop self.stop_cb() File "/home/klipper/klipper/klippy/extras/adxl345.py", line 289, in _finish_measurements self.set_reg(REG_POWER_CTL, 0x00) File "/home/klipper/klipper/klippy/extras/adxl345.py", line 232, in set_reg stored_val = self.read_reg(reg) File "/home/klipper/klipper/klippy/extras/adxl345.py", line 227, in read_reg params = self.spi.spi_transfer([reg | REG_MOD_READ, 0x00]) File "/home/klipper/klipper/klippy/extras/bus.py", line 98, in spi_transfer return self.spi_transfer_cmd.send([self.oid, data], File "/home/klipper/klipper/klippy/mcu.py", line 75, in send return self._do_send([self._cmd.encode(data)], minclock, reqclock) File "/home/klipper/klipper/klippy/mcu.py", line 73, in _do_send raise self._error(str(e)) gcode.CommandError: Unable to obtain 'spi_transfer_response' response Stats 96146.0: gcodein=0 mcu: mcu_awake=0.117 mcu_task_avg=0.000005 mcu_task_stddev=0.000003 bytes_write=155192 bytes_read=57477 bytes_retransmit=0 bytes_invalid=0 send_seq=4098 receive_seq=4098 retransmit_seq=0 srtt=0.001 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=168003292 TH0: mcu_awake=0.773 mcu_task_avg=0.000028 mcu_task_stddev=0.000030 bytes_write=10180 bytes_read=549991 bytes_retransmit=0 bytes_invalid=52 send_seq=994 receive_seq=994 retransmit_seq=0 srtt=0.001 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=12000118 adj=11999845 TH1: mcu_awake=0.002 mcu_task_avg=0.000012 mcu_task_stddev=0.000019 bytes_write=1869 bytes_read=12207 bytes_retransmit=0 bytes_invalid=0 send_seq=225 receive_seq=225 retransmit_seq=0 srtt=0.001 rttvar=0.001 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=12000159 adj=12000072 TH2: mcu_awake=0.002 mcu_task_avg=0.000011 mcu_task_stddev=0.000010 bytes_write=1893 bytes_read=12191 bytes_retransmit=0 bytes_invalid=0 send_seq=226 receive_seq=226 retransmit_seq=0 srtt=0.002 rttvar=0.001 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=64000440 adj=63998868 Arduino_MEGA: mcu_awake=0.068 mcu_task_avg=0.000216 mcu_task_stddev=0.000086 bytes_write=1821 bytes_read=50860 bytes_retransmit=9 bytes_invalid=6 send_seq=190 receive_seq=190 retransmit_seq=2 srtt=0.004 rttvar=0.001 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=15882018 adj=15881703 TH0_temp: temp=35.9 TH1_temp: temp=37.3 TH2_temp: temp=37.7 OrangePi_temp: temp=54.5 Driver_Z0: temp=42.1 Driver_Z1: temp=57.4 Driver_Z2: temp=61.4 Driver_Z3: temp=55.9 Driver_A: temp=54.6 Driver_B: temp=53.3 Motor_Z0: temp=43.2 Motor_Z1: temp=49.7 Motor_Z2: temp=38.2 Motor_Z3: temp=46.6 chamber: temp=31.2 heater_bed: target=60 temp=60.0 pwm=0.000 sysload=0.86 cputime=249.774 memavail=375532 print_time=73.119 buffer_time=1.310 print_stall=0 extruder: target=150 temp=148.5 pwm=0.000 extruder1: target=0 temp=30.5 pwm=0.000 extruder2: target=0 temp=28.9 pwm=0.000 b'stepcompress o=13 i=0 c=6 a=0: Invalid sequence' Exception in flush_handler Traceback (most recent call last): File "/home/klipper/klipper/klippy/toolhead.py", line 435, in _flush_handler self._flush_lookahead() File "/home/klipper/klipper/klippy/toolhead.py", line 365, in _flush_lookahead self.lookahead.flush() File "/home/klipper/klipper/klippy/toolhead.py", line 176, in flush self.toolhead._process_moves(queue[:flush_count]) File "/home/klipper/klipper/klippy/toolhead.py", line 362, in _process_moves self._advance_move_time(next_move_time) File "/home/klipper/klipper/klippy/toolhead.py", line 320, in _advance_move_time self._advance_flush_time(flush_time) File "/home/klipper/klipper/klippy/toolhead.py", line 300, in _advance_flush_time sg(sg_flush_time) File "/home/klipper/klipper/klippy/stepper.py", line 228, in generate_steps raise error("Internal error in stepcompress")`

Additional information and klippy.log

No response

godbnc commented 3 months ago

working :)

kot0005 commented 3 months ago

I also have this issue! occurs after running a macro and when its starts to compile the data. Also having issues where it just errors out compiling the csv to png.

11:12:46 PM Error while generating the graphs: index 147 is out of bounds for axis 0 with size 125 None 11:12:45 PM [experimental] Mechanical health: Mechanical issue detected 11:12:45 PM Belts estimated similarity: -94.1% 11:12:43 PM This may take some time (1-3min) klippy (6).log

Laikulo commented 3 months ago

I ran into this during a few different operations. I reniced klippy to -7, and later put the klippy process into a cpuset sheild, and that seems to keep it safe enough

Frix-x commented 3 months ago

I ran into this during a few different operations. I reniced klippy to -7, and later put the klippy process into a cpuset sheild, and that seems to keep it safe enough

Mhmm not sure to understand how you did this? Did you directly modify the Klipper files? But thanks for the hint... I'll try to see how I can improve it further. I've already put a new commit in the development branch regarding this issue, which should mostly solve it (avoid joining the file write process and use the klipper reactor instead -> this is what solved the issue for most people on the shake&tune computation process, but I forgot to do it on the CSV file writes as well).

Laikulo commented 3 months ago

For testing, at runtime:

# Locate` klippy's main pid
% pgrep -fla klippy
1234 /some/path/python klippy.py ...
# Raise priority
% sudo renice -7 -g 1234
# Create cpuset shield, moving everything off of cores 2&3 (leaving cores 0&1 for workloads)
% sudo cset shield -c 2-3
# Move klippy to the 'shielded' CPUs
% sudo cset shield --threads --pid 1234

For a more permanent soloution: Edit klippy systemd service to set nice

# /etc/systemd/klippy.service.d/21-nice.conf
[Service]
Nice=-7

Create/edit systemd dropin to set default cpu affinity Requires a reboot to take effect

# /etc/systemd/system.conf.d/20-affinity.conf
[Manager]
CPUAffinity=0-1

Move klippy service other CPUs Requires restart of klippy to take effect

# /etc/systemd/klippy.service.d/20-affinity.conf
[Service]
CPUAffinity=2-3

Note: My klipper host service is named 'klippy' and not 'klipper' ( 'klipper' is way more common ) Moonraker will also complain about some modifications the unit, so you may need to adjust moonraker.conf to keep it from complaining.

Laikulo commented 3 months ago

Note that a process cannot decrease it's own "niceness", so setting negative values requires some kind of privileged thing. In this case, it is the service manager.

There are a few differences between the two setups, one uses cpusets, the other affinity, but in this configuration, they should have almost identical effects.

Laikulo commented 3 months ago

On a 4c cortex a-53, with top at 1.4G, I compiled klipper during an input shaper run, and saw klippy using 60%+ between the two cores, and the compiler ran the other two cores into the wall. This is with the cfs scheduler in default configs. Normally doing this would busy the entire machine enough to run the timers out.

Frix-x commented 3 months ago

Hello everyone, could you update to v4.0.2 that is now out and let me know if this error is now fixed?

RNGIllSkillz commented 3 months ago

klippy (34).log same error

lord-of-shred commented 2 months ago

Same error since updating to v4. Happened with 4.0.1, 4.0.2 and now v4.1.0-1-g66f5e32e. Host is a Raspberry Pi Zero 2W...didn't have any problems with this before, made a new cable yesterday, since in the past I had the MCU shutdown when the a previous cable had a loose connector.

During the testing, the CPU is around 35%, RAM between 45% and 70%

klippy_fresh.log

RNGIllSkillz commented 2 months ago

I ran into this during a few different operations. I reniced klippy to -7, and later put the klippy process into a cpuset sheild, and that seems to keep it safe enough

this trick didnt worked for me

lord-of-shred commented 2 months ago

Renicing didn't work on my setup either. The shutdown doesn't seem to have a clear pattern. Sometimes it occurs between 30-60 hz, sometimes when 120 is reached. Couldn't get one measurement to finish.

kot0005 commented 2 months ago

I Run my klipper on a Pi4b 8GB. I am also using a Pi cam3. Are you guys using any cameras ?

RNGIllSkillz commented 2 months ago

im running on orange pi zero 3 with logitech webcam

lord-of-shred commented 2 months ago

I have a noname Webcam added, as well as a Fystec Hotkey board, but I removed them both for testing and it didn't change. The only other MCU that is running is the interface for the ADXL345 which is directly connected to the Raspi.

zavax commented 2 months ago

Hello I have the same issue MCU 'mcu' shutdown: Timer too close. I have a Voron 2.4, with Octopus Pro; BTT 2209 (RP2040) CAN Board; BTT PI ; klipperscreen HDMI 5-inch Touchscreen and Beacon. All software and firmware are up to date. Klippain-ShakeTune version v4.1.0-1-g66f5e32e The macro [AXES_MAP_CALIBRATION] and [COMPARE_BELTS_RESPONSES] work well. The macro [AXES_SHAPER_CALIBRATION] works when there are no temperature setpoints. BUT don't work when I setup the bed and hotend to a temperature (exemple 60°) to test when fans are running. I reboot several times and test without the klipperscreen to save power, still same issue. The error appear a the end of the first test.

19:37 MCU 'mcu' shutdown: Timer too close 19:37 MCU 'mcu' shutdown: Timer too close This often indicates the host computer is overloaded. Check for other processes consuming excessive CPU time, high swap usage, disk errors, overheating, unstable voltage, or similar system problems on the host computer. Once the underlying issue is corrected, use the "FIRMWARE_RESTART" command to reset the firmware, reload the config, and restart the host software. Printer is shutdown 19:37 Cleaned up the output folder (only the last 3 results were kept)! 19:37 input shaper graphs created successfully! 19:37 FIRMWARE_RESTART 19:36 Peaks detected on the graph: 1 @ 35.8 Hz (1 above effect threshold) 19:36 -> Recommended shaper is EI @ 33.4 Hz (when using a square corner velocity of 5.0 and a damping ratio of 0.046) 19:36 To avoid too much smoothing with '3hump_ei', suggested max_accel <= 1600 mm/sec^2 19:36 Fitted shaper '3hump_ei' frequency = 49.0 Hz (vibrations = 0.1%, smoothing ~= 0.354) 19:36 To avoid too much smoothing with '2hump_ei', suggested max_accel <= 1600 mm/sec^2 19:36 Fitted shaper '2hump_ei' frequency = 40.0 Hz (vibrations = 0.1%, smoothing ~= 0.345) 19:36 To avoid too much smoothing with 'ei', suggested max_accel <= 2000 mm/sec^2 19:36 Fitted shaper 'ei' frequency = 33.4 Hz (vibrations = 0.1%, smoothing ~= 0.293) 19:36 To avoid too much smoothing with 'mzv', suggested max_accel <= 1700 mm/sec^2 19:36 Fitted shaper 'mzv' frequency = 24.0 Hz (vibrations = 3.6%, smoothing ~= 0.357) 19:36 To avoid too much smoothing with 'zv', suggested max_accel <= 4700 mm/sec^2 19:36 Fitted shaper 'zv' frequency = 34.8 Hz (vibrations = 12.0%, smoothing ~= 0.129) 19:36 This may take some time (1-3min) 19:36 Y axis frequency profile generation... 19:36 Testing frequency: 133 Hz 19:36 Testing frequency: 132 Hz 19:36 Testing frequency: 131 Hz 19:36 Testing frequency: 130 Hz 19:36 Testing frequency: 129 Hz

jonathjon commented 2 months ago

I am getting the same "timer to close". Running Manta M5P with CB1 on a Ender 5 Pro/Merc 1.1 adxl in EBB42 on tool head. It runs fine for belts and inputshaper. But errors out when trying to create vibrations profile klippy (27).log

RNGIllSkillz commented 2 months ago

Small update. Managed to run through COMPARE_BELTS_RESPONSES without mcu shutdown. I lowered my mycrosteps to reduce the load. lowered from 32 to 16.

jonathjon commented 2 months ago

Small update. Managed to run through COMPARE_BELTS_RESPONSES without mcu shutdown. I lowered my mycrosteps to reduce the load. lowered from 32 to 16.

All of mine are already at 16 and I got a MCU shutdown about 30 seconds into trying to run it, 3 times in a row. When this current print finishes I will attempt to run it again, If I get a different result I will post it here.

RNGIllSkillz commented 2 months ago

Small update. Managed to run through COMPARE_BELTS_RESPONSES without mcu shutdown. I lowered my mycrosteps to reduce the load. lowered from 32 to 16.

All of mine are already at 16 and I got a MCU shutdown about 30 seconds into trying to run it, 3 times in a row. When this current print finishes I will attempt to run it again, If I get a different result I will post it here.

try 8

RNGIllSkillz commented 2 months ago

ye even with that its still really unreliable. it works 1 time out of 5

jonathjon commented 2 months ago

Hello everyone, could you update to v4.0.2 that is now out and let me know if this error is now fixed?

I double checked today and I am running 4.1.0 and still having an issue. But only with create_vibrations_profile

fbrzozowski commented 2 months ago

Same here on fly-gemini v3, no camera but I have klipper screen and extender connected. Will try w/o later.

Edit: I've experimented with (in my case) \etc\systemd\system\klipper-mcu.service increasing TimeoutStopSec from 10 to 30 and it seems that everything's working.

BTW. I found this post on official klipper forum

A “Timer too close” error occurs when the MCU attempts to schedule a timer for a target time that is in the past. It almost always occurs as a result of a message from the host requesting an action at a time that is now in the past.

more

ToddLorey commented 2 months ago

I get timer too close sometimes too. FWIW, Manta M8P V2 with CB1. Temperatures stay below ~46C. I have also had issues where the vibrations test (400mm/s, 200mm size, 2500mm/s^2 accel) will usually run fine but I get a timeout error as it's crunching the data after the motion part of the test. I looked for the timeout value in the code but I can't seem to find it. I think it's likely the CB1. I'm thinking it will help... I have a CB2 ordered from BTT but who knows when they'll decide to send it from China. Looks like it's in stock now but they don't seem to be in any hurry to get it boxed up and shipped. :(

fbrzozowski commented 2 months ago

I switch from gemini v3 to skr pico + orangepi zero 2, update the 'niceness' following @Laikulo suggestions and it seems to be working now (done about 8 vibration profiles and no issue)

dragoonmc commented 2 months ago

I was getting these errors as well when trying to run this test hot. but found that turning off the heaters was enough to let it complete

Frix-x commented 1 month ago

Hello all,

I've done a bit of diagnostics on this issue and was able to reproduce it on one of my machines (now equipped with a CB1 that I just bought for the occasion). What I've found is that it happens when the SOC start to get very hot and get above the throttling temperature, which is by default at 60 degrees in the latest Raspbian images I found, but it's also set to 70 degrees sometime on some other images. I think that when the SOC starts to throttle, all cores (and their running process) are slowed down to avoid overheating, this doesn't affect S&T much, but causes problems for Klipper, which expects very precise timing.

So my current solutions that fixed it on my machine are:

On my side I'm still looking for a solid solution to reduce the computing power needed for S&T... Or a solution that would avoid throttling the Klipper core and only S&T or... whatever. If anyone here has an idea, I'm totally open!

Frix-x commented 1 month ago

Also in the meantime, can you monitor closely your Pi temperature when running S&T and when the timer too close errors happens in order to validate this hypothesis?

lord-of-shred commented 1 month ago

@Frix-x Hi and thanks for checking this out.

I currently have 2 machines with a Raspi Pi Zero 2 W running the latest version of Klipper and S&T. One is a Voron 0.2R1 with a CAN board on which the Pi is mounted. This one also has active cooling on the Mainboard and the Pi. Here I only get the shutdown sometimes during the calculation at the end after the measurements where taken. So this seems to be fine. Temperature here is never above 55°C. Second one is a Vyper that has been modded to a Switchwire where the Pi Zero 2 W is mounted externally with a USB/Ethernet HAT. Only small heat sinks are possible with this configuration. The case for the combination has as much air flow as possible. Operation temps after hours of printing are around 50 - 55°C. On this machine I can't get a single run completed (since the upgrade to the 4 version), it always shuts down during the frequency measurement. I replaced the cable for the ADXL345 as well for the tool head board. Even on a fresh start of the Raspi.

I'll check the temps on some S&T runs and will set the threshold in the Config.txt as suggested and will get back to you after I did this.

ToddLorey commented 1 month ago

I have had issues with TTC during an extended Vibrations run. Originally with a CB1 but updated to a CB2 on a Manta M8P V2. It's not temperatures for me. I had a heatsink and fan on the CB1 (never exceeded ~45C) and now I have a heatsink and fan on the CB2 (never exceed like 42C). Yet I still got TTC when trying a 600mm/s max 2mm/s step vibration run. It was almost always an SPI_transfer_response error that showed up in the logs. I consistently got these two errors:

Traceback (most recent call last): File "/home/biqu/klipper/klippy/mcu.py", line 71, in _do_send return xh.get_response(cmds, self._cmd_queue, minclock, reqclock) File "/home/biqu/klipper/klippy/serialhdl.py", line 327, in get_response raise error("Unable to obtain '%s' response" % (self.name,)) serialhdl.error: Unable to obtain 'spi_transfer_response' response

and

Traceback (most recent call last): File "/home/biqu/klipper/klippy/extras/bulk_sensor.py", line 62, in _stop self.stop_cb() File "/home/biqu/klipper/klippy/extras/adxl345.py", line 289, in _finish_measurements self.set_reg(REG_POWER_CTL, 0x00) File "/home/biqu/klipper/klippy/extras/adxl345.py", line 232, in set_reg stored_val = self.read_reg(reg) File "/home/biqu/klipper/klippy/extras/adxl345.py", line 227, in read_reg params = self.spi.spi_transfer([reg | REG_MOD_READ, 0x00]) File "/home/biqu/klipper/klippy/extras/bus.py", line 98, in spi_transfer return self.spi_transfer_cmd.send([self.oid, data], File "/home/biqu/klipper/klippy/mcu.py", line 75, in send return self._do_send([self._cmd.encode(data)], minclock, reqclock) File "/home/biqu/klipper/klippy/mcu.py", line 73, in _do_send raise self._error(str(e)) gcode.CommandError: Unable to obtain 'spi_transfer_response' response

I think I found a fix for it though... I looked at the .py files for S&T and tweaked two values which seems to have worked.

In the accelerometer.py, I modified line number 16 to FILE_WRITE_TIMEOUT = 20 # seconds (was 10).

In the create_vibrations_profile.py, I modified line number 82 (maybe not necessary) and 137 to toolhead.dwell(0.75) # was toolhead.dwell(0.3).

Not sure if it was one or both that did the trick. And the values might be too big, though I did still get a TTC with the timeout at 15 and dwell at 0.5. I'm still testing but it seems to work. I think the system needed a bit more time to finish writing/transferring files. I had considered trying to write my first Pull Request, but since you're discussing it here and now, I figured I'd jump in.

Also, on a somewhat related note, I found that I needed to increase the timeout value in [shaketune] timeout: 600

Otherwise it would error out during the graph generation phase.

In my troubleshooting journey, I also messed with the CPU affinity and priority/nice values. That didn't help much at all and I haven't backed out those changes yet. I kinda doubt it will make much of a difference one way or another.

Edit... I spoke too soon. Even with the above settings I just got an SPI error. I'm trying now with an additional toolhead.dwell(0.5) added after each move section. We'll see...

ToddLorey commented 1 month ago

I should note that the original settings in my "fixes" did work when I had both X and Y motors set at 16 microsteps and Interpolate=False, with no Webcam. Since I'd like to run with 32 microsteps, Interpolate=True and also with my Webcam attached, I tried it and have failed a couple of times. I think those settings just push things over the edge with timings.

Might there be a way to disable webcam(s) during Shake and Tune tests? We obviously don't need to look at it during a test like this.

ToddLorey commented 1 month ago

After a bit more testing, and also trying a U2C V2.1 USB<->CAN board instead of the Manta's CAN bridge... I still get TTC errors. My latest theory is that it has something to do with data writes. I found this error in the klippy.log file that seemed to trigger the string of errors that followed. When the os.write call fails, it throws all of the timing out of whack and induces a TTC shutdown. As to how to fix it... well, I'm not totally sure how to do it. I'm hoping there's a way to add something that checks if the resource is available before writing, or maybe tries again if it fails, rather than carrying on with more commands that inevitably lead to the TTC.

Traceback (most recent call last): File "/home/biqu/klipper/klippy/gcode.py", line 459, in _respond_raw os.write(self.fd, (msg+"\n").encode()) BlockingIOError: [Errno 11] Resource temporarily unavailable

I hope this helps.

RNGIllSkillz commented 1 month ago

Also in the meantime, can you monitor closely your Pi temperature when running S&T and when the timer too close errors happens in order to validate this hypothesis?

tested on orange pi zero 2. The temp didnt get hightr than 45C, still getting timer error

ToddLorey commented 1 month ago

@RNGIllSkillz Assuming you have just had a TTC error, try uploading the klippy.log file to this site: https://doctor-klipper.clmntw.fr/ Then look on the "Errors" tab and scroll to the end. Work backwards through the errors and I'm curious if a batch of errors has a bunch of "spi_transfer_response" errors that are preceded by the os.write error I showed above. Curious if we are having the same or different issues.

RNGIllSkillz commented 1 month ago

Not be able to. This log is long gone and im currently running printer from a different host. p.s. Not sure if there were ever io error in my logs

Frix-x commented 1 month ago

@ToddLorey Thanks for your input. You may have put your finger on something! So temperature throttling of the SOC may not be the only problem.

In fact, when going to v4, the file write is something that I rewrote from scratch in S&T (I was using the one included in Klipper in v3). This is due to the fact that the file writing process of Klipper is not blocking and sometimes the file wasn't fully written when I started the analysis and I had to do some weird stuff to check if the CSV file was free before trying to open it on my end. But since v4 is now well integrated into Klipper, I was able to rewrite this to make it in a blocking way and delete my hacky check. However, something I didn't consider is that writing everything in one go could potentially be a problem on slower SD cards, as it may also block other processes from accessing it during that time, and it could be quite problematic if Klipper is actually waiting for it...!

Ok, let me try two things:

  1. Implement a write process that does it in smaller chunks with a very small delay between each chunk (like 0.01s) to allow other processes to gain access to the filesystem in between.
  2. Move the wait_for_file_write in the vibration recording command between each run to avoid having a large file still being written in the background while measuring the next set of speed at the same time as the accelerometer measurements are also quite demanding in terms of resources.

I'm not home right now but I'll try to do it tonight :)

ToddLorey commented 1 month ago

@Frix-x That makes a lot of sense. From what I can tell, watching htop memory/processor usage and keeping an eye on Mainsail, there really is no good reason for it to crash. Temps are all in very reasonable ranges, processor usage per core is less than ~40% (though spikes probably aren't captured). The crash just happens unexpectedly. Looking at the log, I notice a sequence of "ADXL345 starting 'adxl345' measurements", then a few lines, followed by a "ADXL345 finished 'adxl345' measurements". Then, when the error occurs, we never get the "ADXL345 finished 'adxl345' measurements":

... Stats 1556.0: gcodein=0 mcu: mcu_awake=0.003 mcu_task_avg=0.000006 mcu_task_stddev=0.000006 ... Stats 1557.0: gcodein=0 mcu: mcu_awake=0.004 mcu_task_avg=0.000005 mcu_task_stddev=0.000006 ... ADXL345 finished 'adxl345' measurements Stats 1558.0: gcodein=0 mcu: mcu_awake=0.004 mcu_task_avg=0.000005 mcu_task_stddev=0.000006 ... Current speed: 374.0 mm/s ADXL345 starting 'adxl345' measurements Stats 1559.0: gcodein=0 mcu: mcu_awake=0.004 mcu_task_avg=0.000005 mcu_task_stddev=0.000006 ... Stats 1560.0: gcodein=0 mcu: mcu_awake=0.004 mcu_task_avg=0.000005 mcu_task_stddev=0.000006 ... Stats 1561.0: gcodein=0 mcu: mcu_awake=0.004 mcu_task_avg=0.000005 mcu_task_stddev=0.000006 ... Stats 1562.0: gcodein=0 mcu: mcu_awake=0.004 mcu_task_avg=0.000005 mcu_task_stddev=0.000006 ... Stats 1563.0: gcodein=0 mcu: mcu_awake=0.004 mcu_task_avg=0.000005 mcu_task_stddev=0.000006 ... ADXL345 finished 'adxl345' measurements Stats 1564.0: gcodein=0 mcu: mcu_awake=0.004 mcu_task_avg=0.000005 mcu_task_stddev=0.000006 ... Current speed: 376.0 mm/s ADXL345 starting 'adxl345' measurements Stats 1565.0: gcodein=0 mcu: mcu_awake=0.004 mcu_task_avg=0.000005 mcu_task_stddev=0.000006 ... Stats 1566.0: gcodein=0 mcu: mcu_awake=0.003 mcu_task_avg=0.000005 mcu_task_stddev=0.000006 ... Stats 1567.0: gcodein=0 mcu: mcu_awake=0.003 mcu_task_avg=0.000005 mcu_task_stddev=0.000006 ... Stats 1568.0: gcodein=0 mcu: mcu_awake=0.003 mcu_task_avg=0.000005 mcu_task_stddev=0.000006 ... Stats 1569.0: gcodein=0 mcu: mcu_awake=0.003 mcu_task_avg=0.000005 mcu_task_stddev=0.000006 ... Stats 1570.2: gcodein=0 mcu: mcu_awake=0.003 mcu_task_avg=0.000005 mcu_task_stddev=0.000006 bytes_write=1084360 bytes_read=921140 bytes_retransmit=9 bytes_invalid=0 send_seq=58996 receive_seq=58993 retransmit_seq=15 srtt=0.000 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=399994563 EBBCan: mcu_awake=0.003 mcu_task_avg=0.000016 mcu_task_stddev=0.000015 bytes_write=14127 bytes_read=196830 bytes_retransmit=0 bytes_invalid=0 send_seq=1983 receive_seq=1983 retransmit_seq=0 srtt=0.001 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=63999523 adj=64000502 scanner: mcu_awake=0.850 mcu_task_avg=0.000045 mcu_task_stddev=0.000036 bytes_write=149004 bytes_read=24355123 bytes_retransmit=0 bytes_invalid=0 send_seq=18966 receive_seq=18966 retransmit_seq=0 srtt=0.001 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=47998274 adj=47998923 scanner: coil_temp=26.2 refs=0 EBB_NTC: temp=30.8 MCU: temp=39.7 SoC: temp=37.8 chamber: temp=26.7 heater_bed: target=0 temp=26.5 pwm=0.000 CB2: temp=37.8 sysload=0.42 cputime=417.820 memavail=1439524 print_time=3085.960 buffer_time=0.000 print_stall=0 extruder: target=0 temp=26.4 pwm=0.000 MCU 'mcu' shutdown: Timer too close clocksync state: mcu_freq=400000000 last_clock=1234103917493 clock_est=(1540.859 1222675609966 399994563.752) min_half_rtt=0.000049 min_rtt_time=640.925 time_avg=1540.859(845.701) clock_avg=1222675609966.718(338275877121.374) pred_variance=78466552.468 Dumping serial stats: bytes_write=1084360 bytes_read=921225 bytes_retransmit=9 bytes_invalid=0 send_seq=58996 receive_seq=58996 retransmit_seq=15 srtt=0.000 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 Dumping send queue 100 messages ...

Frix-x commented 1 month ago

Yes, this make sense! Ok, I've tried a first step here: https://github.com/Frix-x/klippain-shaketune/tree/ttc-fix

Can you try this branch and let me know if it helps?

This new branch contain some adjustment to always wait for a file write to finish before going into the next accelerometer measurements. This was adjusted for all the type of graphs. Regarding the chunk writing, it's apparently already the default mode of writing for the Python open(filename, 'w') function. So I've not changed this at the moment.

PS: don't bother about the new graph in the IS graphs as it's still a WIP (I based the patch branch from develop) and I need to write some documentation about it

lord-of-shred commented 1 month ago

On the Vyperwire that is the machine that hasn't had a successful run since the 4.0.2 upgrade I have a Samsung PRO Plus 128 GB (180/130MB/s R/W) MircoSD card in the Raspberry Pi Zero 2 W. That should be more than enough from the speeds. If I get the time over the next few days, I'll try another SD card I have lying around but from the specs/quality it should be inferior to the Samsung (SanDisk Extreme microSDXC UHS-I 64 GB). After perusing the Klippy.log if found these two error statements.

Traceback (most recent call last): File "/home/vyper/klipper/klippy/mcu.py", line 71, in _do_send return xh.get_response(cmds, self._cmd_queue, minclock, reqclock) File "/home/vyper/klipper/klippy/serialhdl.py", line 327, in get_response raise error("Unable to obtain '%s' response" % (self.name,)) serialhdl.error: Unable to obtain 'spi_transfer_response' response ``

Traceback (most recent call last):
  File "/home/vyper/klipper/klippy/extras/bulk_sensor.py", line 62, in _stop
    self.stop_cb()
  File "/home/vyper/klipper/klippy/extras/adxl345.py", line 289, in _finish_measurements
    self.set_reg(REG_POWER_CTL, 0x00)
  File "/home/vyper/klipper/klippy/extras/adxl345.py", line 232, in set_reg
    stored_val = self.read_reg(reg)
  File "/home/vyper/klipper/klippy/extras/adxl345.py", line 227, in read_reg
    params = self.spi.spi_transfer([reg | REG_MOD_READ, 0x00])
  File "/home/vyper/klipper/klippy/extras/bus.py", line 98, in spi_transfer
    return self.spi_transfer_cmd.send([self.oid, data],
  File "/home/vyper/klipper/klippy/mcu.py", line 75, in send
    return self._do_send([self._cmd.encode(data)], minclock, reqclock)
  File "/home/vyper/klipper/klippy/mcu.py", line 73, in _do_send
    raise self._error(str(e))
gcode.CommandError: Unable to obtain 'spi_transfer_response' response

Maybe this would help.

ToddLorey commented 1 month ago

Yes, this make sense! Ok, I've tried a first step here: https://github.com/Frix-x/klippain-shaketune/tree/ttc-fix

Can you try this branch and let me know if it helps?

This new branch contain some adjustment to always wait for a file write to finish before going into the next accelerometer measurements. This was adjusted for all the type of graphs. Regarding the chunk writing, it's apparently already the default mode of writing for the Python open(filename, 'w') function. So I've not changed this at the moment.

PS: don't bother about the new graph in the IS graphs as it's still a WIP (I based the patch branch from develop) and I need to write some documentation about it

Giving this a shot now...

ToddLorey commented 1 month ago

Now I'm getting a different error. Still TTC...

Traceback (most recent call last): File "/home/biqu/klipper/klippy/mcu.py", line 71, in _do_send return xh.get_response(cmds, self._cmd_queue, minclock, reqclock) File "/home/biqu/klipper/klippy/serialhdl.py", line 327, in get_response raise error("Unable to obtain '%s' response" % (self.name,)) serialhdl.error: Unable to obtain 'sensor_bulk_status' response

and

Traceback (most recent call last): File "/home/biqu/klipper/klippy/extras/bulk_sensor.py", line 72, in _proc_batch msg = self.batch_cb(eventtime) File "/home/biqu/klipper/klippy/extras/adxl345.py", line 294, in _process_batch samples = self.ffreader.pull_samples() File "/home/biqu/klipper/klippy/extras/bulk_sensor.py", line 270, in pull_samples self._update_clock() File "/home/biqu/klipper/klippy/extras/bulk_sensor.py", line 242, in _update_clock params = self.query_status_cmd.send([self.oid]) File "/home/biqu/klipper/klippy/mcu.py", line 75, in send return self._do_send([self._cmd.encode(data)], minclock, reqclock) File "/home/biqu/klipper/klippy/mcu.py", line 73, in _do_send raise self._error(str(e)) gcode.CommandError: Unable to obtain 'sensor_bulk_status' response

followed by the same spi_transfer_response errors.

lord-of-shred commented 1 month ago

@Frix-x So, now I've had some time to play around with the original and the fixed version. I ran 10 consecutive measurements on the Vyperwire (which gave me the most problems) with the normal and the WIP fix branch. I haven't done any Klipper or other updates since then and couldn't get any more MCU shutdowns.

Things changed:

On the V0.2 I had 2 MCU shutdowns in the 20 runs, both with the original branch. Here the temperature (due to active cooling) in the same room neaver went above 47°C.

I know these are just some datapoints but maybe someone else can try this also and help gather more info to find a possible fix.

Thanks for your great work!

Frix-x commented 1 month ago

Hi, thanks for your feedback! On my end, I've narrowed it down to two things:

Actually, I'm thinking about improving the file handling in Shake&Tune. In the current version, I've replicated the behavior of Klipper, where after a measurement a CSV is saved with the raw accelerometer data inside, then a second part of the script waits for it to be fully written and opens the CSV again for analysis. But since Shake&Tune is a combined system, it doesn't make sense to do this, and I think I can improve it in three ways:

  1. Keep everything in RAM and pass it directly to the analysis without writing a file in between.
  2. And if a file is needed for the archive, write it asynchronously in the background when the printer is idle at the end of the command.
  3. And since everything is in S&T, I can also change the file content by saving binay/compressed data instead of raw CSV files as they can get quite big (today vibration measurement can save about 300-500MB of data in hundreds of CSV files) -> having only one compressed file with everything inside should be better.

But this will take a bit of time as it's a pretty big change to Shake&Tune... So I'll keep you informed when I'm done with it (I'll continue working in the ttc-fix branch for this)

Frix-x commented 1 month ago

Hello all,

I've finished most of the parts in the new ttc-fix branch to store data in RAM and avoid writing files. So now there is a .stdata file that stores all the data in a compressed binary format. This file will not be saved if you set keep_raw_data: False in the [shaketune] config section. Can you try it and let me know how the behavior evolves on your side with this new version?

On my side, I now get timer too close errors only during the vibration measurement (not during the calculation, but randomly during the recording). This seems to be related to peaks in the Klipper load.

In fact, I get these spikes around ~100-110% during measurements:

image

And suddenly I get a spike higher at ~800% and the TTC error at the same time. This graph was taken only a few seconds later, and note that the scale has changed, so the few spikes from the previous graph are much smaller here compared to the last one:

image

Finally, you can see that the RAM usage increases, but this is normal and due to the fact that the measurements are now stored directly in RAM.

lord-of-shred commented 1 month ago

@Frix-x Hi,

Thanks for all your work on this. Currently I need my both printers up and running for some parts that are urgent, but next weekend I should have some free time to spare and check out your fix.

ToddLorey commented 1 month ago

I gave it a try and successfully made it through belts and shaper measurements but the vibration measurement still has TTC around the same time for both times I've tried. It's ~152mm/s on the first axis. Here's the seemingly relevant section of the log:

`Current speed: 152.0 mm/s ADXL345 starting 'adxl345' measurements

Stats 2379.7: gcodein=0 mcu: mcu_awake=0.003 mcu_task_avg=0.000006 mcu_task_stddev=0.000008 bytes_write=296693 bytes_read=280323 bytes_retransmit=9 bytes_invalid=0 send_seq=17567 receive_seq=17567 retransmit_seq=2 srtt=0.001 rttvar=0.001 rto=0.025 ready_bytes=53 upcoming_bytes=0 freq=399994540 HBB: mcu_awake=0.000 mcu_task_avg=0.000004 mcu_task_stddev=0.000003 bytes_write=3242 bytes_read=11370 bytes_retransmit=9 bytes_invalid=0 send_seq=496 receive_seq=496 retransmit_seq=2 srtt=0.000 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=12000340 adj=12000511 EBBCan: mcu_awake=0.003 mcu_task_avg=0.000017 mcu_task_stddev=0.000015 bytes_write=10379 bytes_read=61428 bytes_retransmit=7 bytes_invalid=0 send_seq=1065 receive_seq=1065 retransmit_seq=138 srtt=0.001 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=63999474 adj=64000361 EBB_NTC: temp=35.0 chamber: temp=30.8 CB2: temp=40.0 heater_bed: target=0 temp=28.9 pwm=0.000 scanner: mcu_awake=0.899 mcu_task_avg=0.000044 mcu_task_stddev=0.000036 bytes_write=65231 bytes_read=5277574 bytes_retransmit=0 bytes_invalid=0 send_seq=6935 receive_seq=6935 retransmit_seq=0 srtt=0.001 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=47998236 adj=47999115 scanner: coil_temp=29.2 refs=0 sysload=1.15 cputime=353.074 memavail=1441436 print_time=388.944 buffer_time=2.527 print_stall=0 extruder: target=0 temp=29.7 pwm=0.000

Stats 2380.7: gcodein=0 mcu: mcu_awake=0.003 mcu_task_avg=0.000006 mcu_task_stddev=0.000007 bytes_write=297158 bytes_read=280882 bytes_retransmit=9 bytes_invalid=0 send_seq=17598 receive_seq=17598 retransmit_seq=2 srtt=0.000 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=399994534 HBB: mcu_awake=0.000 mcu_task_avg=0.000004 mcu_task_stddev=0.000003 bytes_write=3248 bytes_read=11386 bytes_retransmit=9 bytes_invalid=0 send_seq=497 receive_seq=497 retransmit_seq=2 srtt=0.000 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=12000340 adj=12000508 EBBCan: mcu_awake=0.003 mcu_task_avg=0.000017 mcu_task_stddev=0.000015 bytes_write=10385 bytes_read=61537 bytes_retransmit=7 bytes_invalid=0 send_seq=1066 receive_seq=1066 retransmit_seq=138 srtt=0.001 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=63999480 adj=64000353 EBB_NTC: temp=35.0 chamber: temp=30.8 CB2: temp=40.0 heater_bed: target=0 temp=28.9 pwm=0.000 scanner: mcu_awake=0.899 mcu_task_avg=0.000044 mcu_task_stddev=0.000036 bytes_write=65307 bytes_read=5296696 bytes_retransmit=0 bytes_invalid=0 send_seq=6946 receive_seq=6946 retransmit_seq=0 srtt=0.001 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0 freq=47998243 adj=47998980 scanner: coil_temp=29.2 refs=0 sysload=1.15 cputime=353.358 memavail=1441692 print_time=388.944 buffer_time=1.527 print_stall=0 extruder: target=0 temp=29.8 pwm=0.000

MCU 'mcu' shutdown: Timer too close

clocksync state: mcu_freq=400000000 last_clock=155699578203 clock_est=(2353.089 143917757494 399994524.438) min_half_rtt=0.000074 min_rtt_time=2380.717 time_avg=2353.089(856.574) clock_avg=143917757494.277(342624741657.600) pred_variance=59544343.276

Dumping serial stats: bytes_write=297760 bytes_read=281558 bytes_retransmit=131 bytes_invalid=0 send_seq=17633 receive_seq=17631 retransmit_seq=17633 srtt=0.000 rttvar=0.000 rto=0.025 ready_bytes=0 upcoming_bytes=0

Dumping send queue 100 messages `

chackv268 commented 1 month ago

Hi, I am currently running a Manta M8P + CB1. I have been getting these errors and tried the new fix. I got it almost immediately when trying the vibration map. The temps are all ok and I have heatsinks on the CB1. Here is my klippy.log klippy.log

chackv268 commented 1 month ago

To follow up on this. I tried again and it worked. I ran vibration map, axes and the other macros with no issues

chackv268 commented 3 weeks ago

I installed cartographer after posting the previous comment. This made the situation much worse. I posted in their discord and was directed to this. I went through most of the steps, up till the CPU scheduler. This sorted out the issue. Might be worth checking out

pylorak commented 2 weeks ago

On a BTT Pi 1.2 (equal hardware as a CB1), CAN@1Mbit and EBB36, even on the main branch of Shaketune I can run COMPARE_BELTS_RESPONSES and AXES_SHAPER_CALIBRATION reliably with no problems at all.

However, CREATE_VIBRATIONS_PROFILE always fails with a ttc error during data collection (while vibrating). I tried the ttc-fix branch both with keep_raw_data True and also with False, but CREATE_VIBRATIONS_PROFILE still cannot complete no matter what. Most of the time it fails pretty early on (below 20mm/s).

I also tried various common tips, like pinning processes to specific cores or setting the CPU governor to "performance" (for this I had to upgrade the Armbian kernel to edge), but none of these helped. No sign of host overheating, CPU core temp stays around 50°C (has a large heatsink and airflow).

EDIT: If it's any help, I've attached my klippy.log. klippy.log