LinuxCNC / linuxcnc

LinuxCNC controls CNC machines. It can drive milling machines, lathes, 3d printers, laser cutters, plasma cutters, robot arms, hexapods, and more.
http://linuxcnc.org/
GNU General Public License v2.0
1.78k stars 1.15k forks source link

2.9 Gmoccapy - wrong tool disable MDI mode #3129

Open zz912 opened 6 days ago

zz912 commented 6 days ago

If I set wrong tool (tool is not in tooltable), then I cannot activate MDI windows. MDI mode is activated, but users cannot see it.

https://github.com/user-attachments/assets/f3256603-3337-47f3-93fa-c32a9908f774

I entered G49 to rule out bugs related to AUTOMATIC_G43.

For better bug diagnosis, I made this PR: https://github.com/LinuxCNC/linuxcnc/pull/3123

Sigma1912 commented 6 days ago

Confirmed. Here is the debug from my simulation machine running master.

First changing to a tool nr that is present in the tool table (Note the 'RUN' and 'IDLE' messages that point to GStat correctly sending messages about the interpreter mode changing first to run and then back to idle):

[Gmoccapy][DEBUG]  ntb_button_switch_page (gmoccapy:5256)
3 2
[Gmoccapy][DEBUG]  MDI Mode, tool_change = True (gmoccapy:2745)
[Gmoccapy][DEBUG]  ntb_button_switch_page (gmoccapy:5256)
[Gmoccapy][DEBUG]  RUN (gmoccapy:2619)
[Gmoccapy][DEBUG]  hal status motion mode changed (gmoccapy:2821)
[Gmoccapy][DEBUG]  IDLE (gmoccapy:2567)
[Gmoccapy][DEBUG]  hal signal tool changed (gmoccapy:2638)
[Gmoccapy][DEBUG]  Tool is now 1 (gmoccapy:3497)
[Gmoccapy][DEBUG]  G43 is active (gmoccapy:3499)

Then here when calling a non-existent tool nr (note the absence of 'RUN' and 'IDLE' ):

task: main loop took 0.022615 seconds
emc/task/emctask.cc 68: interp_error: Requested tool 45 not found in the tool table
Requested tool 45 not found in the tool table
task: main loop took 0.019085 seconds
[Gmoccapy][DEBUG]  MDI Mode, tool_change = True (gmoccapy:2745)
[Gmoccapy][DEBUG]  ntb_button_switch_page (gmoccapy:5256)
[Gmoccapy][DEBUG]  hal status motion mode changed (gmoccapy:2821)
[Gmoccapy][DEBUG]  _on_play_sound <__main__.gmoccapy object at 0x7fe3cc270d00> None error (gmoccapy:5501)

This seems to be a very similar issue as https://github.com/LinuxCNC/linuxcnc/issues/3120.

Sigma1912 commented 6 days ago

possible fix:

Change this: M6_T?_is

To this (Note that replacing 'self.command.mdi("M66 E0 L0") with 'self.command.wait_complete()'' does not seem to fix it): M6_T?_fix

Sigma1912 commented 6 days ago

The idea being that with the G4 command and the following queue buster we have the interpreter_mode change to 'run' for long enough for the GStat module to sense the change and send a message to Gmoccapy before the interpreter ingests the 'T{0} M6' which causes the abort.

zz912 commented 6 days ago

I understand you, but I dont know, if G4 is clean solution. This problem is also in 2.9. Did you tested it in 2.9?

Sigma1912 commented 6 days ago

It certainly seems the easiest solution but might need a comment in the code as to why this is needed. The underlying problem is the reliance on GStat messaging to catch the interpreter mode changing to 'run'. Since GStat is a module that polls states at certain intervals in user space there is always going to be the problem of it potentially missing state changes that do not last as long as the polling interval. Even if the polling interval was shorter there is no guarantee that it doesn't miss anything as there may be even shorter changes happening. So it seems to me that either the event driven architecture needs to change or to make sure that the interpreter calls coming from the GUI take longer to execute than the GStat polling interval even if the gcode sent to the interpreter causes an abort.

N.B. I find it quite surprising that 'self.command.wait_complete()' does not fix this (at least on my PC) which may be a bit of an indication that we may rely a bit too much on it.

And yes this also fixes 2.9, tested.

Sigma1912 commented 6 days ago

Actually, now that I think about it, the problem with 'self.command.wait_complete()' likely is that it blocks python execution and thus also blocks the GStat module. So during 'self.command.wait_complete()' an event driven GUI using GStat messages is basically blind.

zz912 commented 6 days ago

Thanks for researching the bug.

I would like to ask @rmu75 for a comment/opinion.

Sigma1912 commented 6 days ago

Thanks for finding all the bugs :)

zz912 commented 6 days ago

Actually, now that I think about it, the problem with 'self.command.wait_complete()' likely is that it blocks python execution and thus also blocks the GStat module. So during 'self.command.wait_complete()' an event driven GUI using GStat messages is basically blind.

Yes https://github.com/LinuxCNC/linuxcnc/issues/2586

Sigma1912 commented 6 days ago

I guess even worse is that the GStat module itself is blind.

zz912 commented 6 days ago

To Sigma1912: You might be interested in this: https://forum.linuxcnc.org/38-general-linuxcnc-questions/51179-python-interface-makes-race-conditions-mayby#289513