Open zz912 opened 6 days ago
Confirmed. Here is the debug from my simulation machine running master.
First changing to a tool nr that is present in the tool table (Note the 'RUN' and 'IDLE' messages that point to GStat correctly sending messages about the interpreter mode changing first to run and then back to idle):
[Gmoccapy][DEBUG] ntb_button_switch_page (gmoccapy:5256)
3 2
[Gmoccapy][DEBUG] MDI Mode, tool_change = True (gmoccapy:2745)
[Gmoccapy][DEBUG] ntb_button_switch_page (gmoccapy:5256)
[Gmoccapy][DEBUG] RUN (gmoccapy:2619)
[Gmoccapy][DEBUG] hal status motion mode changed (gmoccapy:2821)
[Gmoccapy][DEBUG] IDLE (gmoccapy:2567)
[Gmoccapy][DEBUG] hal signal tool changed (gmoccapy:2638)
[Gmoccapy][DEBUG] Tool is now 1 (gmoccapy:3497)
[Gmoccapy][DEBUG] G43 is active (gmoccapy:3499)
Then here when calling a non-existent tool nr (note the absence of 'RUN' and 'IDLE' ):
task: main loop took 0.022615 seconds
emc/task/emctask.cc 68: interp_error: Requested tool 45 not found in the tool table
Requested tool 45 not found in the tool table
task: main loop took 0.019085 seconds
[Gmoccapy][DEBUG] MDI Mode, tool_change = True (gmoccapy:2745)
[Gmoccapy][DEBUG] ntb_button_switch_page (gmoccapy:5256)
[Gmoccapy][DEBUG] hal status motion mode changed (gmoccapy:2821)
[Gmoccapy][DEBUG] _on_play_sound <__main__.gmoccapy object at 0x7fe3cc270d00> None error (gmoccapy:5501)
This seems to be a very similar issue as https://github.com/LinuxCNC/linuxcnc/issues/3120.
possible fix:
Change this:
To this (Note that replacing 'self.command.mdi("M66 E0 L0") with 'self.command.wait_complete()'' does not seem to fix it):
The idea being that with the G4 command and the following queue buster we have the interpreter_mode change to 'run' for long enough for the GStat module to sense the change and send a message to Gmoccapy before the interpreter ingests the 'T{0} M6' which causes the abort.
I understand you, but I dont know, if G4 is clean solution. This problem is also in 2.9. Did you tested it in 2.9?
It certainly seems the easiest solution but might need a comment in the code as to why this is needed. The underlying problem is the reliance on GStat messaging to catch the interpreter mode changing to 'run'. Since GStat is a module that polls states at certain intervals in user space there is always going to be the problem of it potentially missing state changes that do not last as long as the polling interval. Even if the polling interval was shorter there is no guarantee that it doesn't miss anything as there may be even shorter changes happening. So it seems to me that either the event driven architecture needs to change or to make sure that the interpreter calls coming from the GUI take longer to execute than the GStat polling interval even if the gcode sent to the interpreter causes an abort.
N.B. I find it quite surprising that 'self.command.wait_complete()' does not fix this (at least on my PC) which may be a bit of an indication that we may rely a bit too much on it.
And yes this also fixes 2.9, tested.
Actually, now that I think about it, the problem with 'self.command.wait_complete()' likely is that it blocks python execution and thus also blocks the GStat module. So during 'self.command.wait_complete()' an event driven GUI using GStat messages is basically blind.
Thanks for researching the bug.
I would like to ask @rmu75 for a comment/opinion.
Thanks for finding all the bugs :)
Actually, now that I think about it, the problem with 'self.command.wait_complete()' likely is that it blocks python execution and thus also blocks the GStat module. So during 'self.command.wait_complete()' an event driven GUI using GStat messages is basically blind.
I guess even worse is that the GStat module itself is blind.
To Sigma1912: You might be interested in this: https://forum.linuxcnc.org/38-general-linuxcnc-questions/51179-python-interface-makes-race-conditions-mayby#289513
If I set wrong tool (tool is not in tooltable), then I cannot activate MDI windows. MDI mode is activated, but users cannot see it.
https://github.com/user-attachments/assets/f3256603-3337-47f3-93fa-c32a9908f774
I entered G49 to rule out bugs related to AUTOMATIC_G43.
For better bug diagnosis, I made this PR: https://github.com/LinuxCNC/linuxcnc/pull/3123