Klipper3d / klipper

Klipper is a 3d-printer firmware
GNU General Public License v3.0
8.98k stars 5.17k forks source link

New error_mcu module; report sensor on "ADC out of range" #6620

Closed KevinOConnor closed 4 days ago

KevinOConnor commented 1 week ago

This PR adds a new error_mcu host module to facilitate error formatting. It also attempts to report which ADC sensor is out of range on an "ADC out of range" mcu error.

Handling micro-controller errors is challenging because we really want the heaters to be disabled if an unexpected error occurs. That means we don't want a lot of code complexity to analyze the source of the error in the code that needs to immediately handle that error. This can lead to crytpic mcu error reports.

This PR creates a new error_mcu host module that can analyze the low-level error reports after the main shutdown processing has completed. Thus, the error_mcu code is no longer on the "critical shutdown path" and can take more time and resources to format a more user friendly error message.

With this PR, the low-level MCU_adc code can identify if recent adc values were out of range, and then communicate that to the new error_mcu module. This is then used in an "ADC out of range" error report. For example:

MCU 'mcu' shutdown: ADC out of range

Sensor 'extruder' reporting out of range

This generally occurs when a heater temperature exceeds
its configured min_temp or max_temp.
Once the underlying issue is corrected, use the
"FIRMWARE_RESTART" command to reset the firmware, reload the
config, and restart the host software.
Printer is shutdown

This ADC check should work in most cases, but it is possible that an out-of-range error may occur while the host is unable to identify which module was out of range.

-Kevin

EDIT: The new module was renamed to "error_mcu.py"

rogerlz commented 1 week ago

That’s great! It would be helpful to report the most recent reading result so the user doesn’t have to parse the log to see how far the reading was in relation to the min/max values.

meteyou commented 1 week ago

I've been waiting for this function in klipper for a long time! Thank you very much! it will make troubleshooting a lot easier.

I can only agree with @rogerlz . The last read value would also be helpful. this way, you can also determine whether it is a cable break (or not/incorrectly connected cable) or a short circuit (damaged sensor).

meteyou commented 1 week ago

@KevinOConnor is it possible that you are not using multiline strings for these messages?

background: we can only delete/ignore all \n in the GUI, but your formatted paragraphs make sense, but the line breaks within a paragraph sometimes interfere with the output. we cannot explicitly address these so we have to output them 1:1. this can lead to additional annoying line breaks, e.g. a single word is then in the next line.

KevinOConnor commented 1 week ago

Thanks. I reworked this PR a bit and the error will now report the last temperature along with the valid range (if it is able to detect which sensor is having an issue).

MCU 'mcu' shutdown: ADC out of range

Sensor 'extruder' temperature 60.512 not in range 0.000:60.000

This generally occurs when a heater temperature exceeds
its configured min_temp or max_temp.
Once the underlying issue is corrected, use the
"FIRMWARE_RESTART" command to reset the firmware, reload the
config, and restart the host software.
Printer is shutdown

-Kevin

KevinOConnor commented 1 week ago

@KevinOConnor is it possible that you are not using multiline strings for these messages?

I'm not sure what you are reporting or requesting.

The error messages should have embedded newlines in them (\n) and historically we tried to keep the messages to no more than 60 characters per line (as historically this length kinda fit in OctoPrint).

I'm open to a new scheme if you have a proposal.

Cheers, -Kevin

meteyou commented 1 week ago

@KevinOConnor here an example from the current/old error message. The first screenshot is with an optimal screen size: image

And here a screenshot with a "bad" screen size: image

The word "the" is here in the next line and the paragraph looks "strange".

if you don't use 60 characters per line for long sentences, it would be nice if the GUIs could make the line breaks themselves depending on the available space. you can still use line breaks to separate paragraphs to structure the error message, just not in the middle of a sentences.

KevinOConnor commented 1 week ago

Thanks. I've also seen that in Mainsail. But, I'm not sure what you are proposing. The code on this PR will produce:

MCU 'mcu' shutdown: ADC out of range

Sensor 'extruder' temperature 60.512 not in range 0.000:60.000

This generally occurs when a heater temperature exceeds
its configured min_temp or max_temp.
Once the underlying issue is corrected, use the
"FIRMWARE_RESTART" command to reset the firmware, reload the
config, and restart the host software.
Printer is shutdown

in the printer.webhooks.state_message status. (Where each newline in the text above would be sent as a 0x0a (`\n') character.)

Can you provide a snippet of the raw text that you would prefer to see?

-Kevin

meteyou commented 1 week ago

Here is an example with "cleaned" newlines:

MCU 'mcu' shutdown: ADC out of range

Sensor 'extruder' temperature 60.512 not in range 0.000:60.000

This generally occurs when a heater temperature exceeds its configured min_temp or max_temp.
Once the underlying issue is corrected, use the "FIRMWARE_RESTART" command to reset the firmware, reload the config, and restart the host software.
Printer is shutdown

and here is a possible Python code:

message_shutdown = "Once the underlying issue is corrected, use the 'FIRMWARE_RESTART' command " \
                   "to reset the firmware, reload the config, and restart the host software. \n" \
                   "Printer is shutdown"

message_protocol_error1 = "This is frequently caused by running an older version of the firmware " \
                          "on the MCU(s). Fix by recompiling and flashing the firmware."

message_protocol_error2 = "Once the underlying issue is corrected, use the 'RESTART' command to " \
                          "reload the config and restart the host software."

message_mcu_connect_error = "Once the underlying issue is corrected, use the 'FIRMWARE_RESTART' " \
                            "command to reset the firmware, reload the config, and restart the " \
                            "host software. Error configuring printer"

Common_MCU_errors = {
    ("Timer too close",): "This often indicates the host computer is overloaded. Check for other "
                          "processes consuming excessive CPU time, high swap usage, disk errors, "
                          "overheating, unstable voltage, or similar system problems on the host "
                          "computer.",
    ("Missed scheduling of next ",): "This is generally indicative of an intermittent communication "
                                     "failure between micro-controller and host.",
    ("ADC out of range",): "This generally occurs when a heater temperature exceeds its configured "
                           "min_temp or max_temp.",
    ("Rescheduled timer in the past",
     "Stepper too far in past"): "This generally occurs when the micro-controller has been "
                                 "requested to step at a rate higher than it is capable of "
                                 "obtaining.",
    ("Command request",): "This generally occurs in response to an M112 G-Code command or in "
                          "response to an internal error in the host software.",
}

(Python is not my area of expertise)

EDIT: It should also be possible with multiline strings and just escape the linebreaks:

message_shutdown = """
Once the underlying issue is corrected, use the \
"FIRMWARE_RESTART" command to reset the firmware, reload the \
config, and restart the host software.
Printer is shutdown
"""

message_protocol_error1 = """
This is frequently caused by running an older version of the \
firmware on the MCU(s). Fix by recompiling and flashing the \
firmware.
"""

message_protocol_error2 = """
Once the underlying issue is corrected, use the "RESTART" \
command to reload the config and restart the host software.
"""

message_mcu_connect_error = """
Once the underlying issue is corrected, use the \
"FIRMWARE_RESTART" command to reset the firmware, reload the \
config, and restart the host software.
Error configuring printer
"""

Common_MCU_errors = {
    ("Timer too close",): """
This often indicates the host computer is overloaded. Check \
for other processes consuming excessive CPU time, high swap \
usage, disk errors, overheating, unstable voltage, or \
similar system problems on the host computer.""",
    ("Missed scheduling of next ",): """
This is generally indicative of an intermittent \
communication failure between micro-controller and host.""",
    ("ADC out of range",): """
This generally occurs when a heater temperature exceeds \
its configured min_temp or max_temp.""",
    ("Rescheduled timer in the past", "Stepper too far in past"): """
This generally occurs when the micro-controller has been \
requested to step at a rate higher than it is capable of \
obtaining.""",
    ("Command request",): """
This generally occurs in response to an M112 G-Code command \
or in response to an internal error in the host software.""",
}

FYI: Both are not tested! I only edited it in the editor and copy&paste it.

KevinOConnor commented 1 week ago

Thanks. I think I understand. Basically, use newlines as an end of paragraph marker.

I'm fine with making that change. I think a change like that should be separate from this PR though. I am also not sure if the traditional g-code clients (those using the "pseudo-tty" like OctoPrint) can handle really long lines (eg, 300+ characters per line) - it may be necessary for Klipper to reformat the error message content when sending it on the pseudo-tty.

Cheers, -Kevin