Closed zz912 closed 1 year ago
As there are three modes, Manual and MDI and AUTO the behavior of gmoccapy is exactly as it should!
No MDI commands in MANUAL mode. To execute commands from MANUAL mode, you could combine MDI commands, first command switch to MDI then execute the MDI command you want and switch back to manual mode. You may use the corresponding Halui commands to change Modes.
Doing this, may result in a short flicker of the screen, as gmoccapy will change the screen design due to the mode switch.
Norbert
Hello Norbert,
1) Thanks for explaining the problem. Now I know how to solve it.
2) I think Gmoccapy behavior is not ideal. What bothers me the most is that it acts randomly. Paradoxically, more often it allows you to execute an MDI command without this message popping up, and every time the MDI commands are executed in MANUAL. It is not absolutely necessary, but it would be nice if the message appeared regularly and the execution of MDI commands was really prohibited.
3) Would it be possible to add a "HAL pin" to Gmoccapy that would pause the redraw of the Gmoccapy screen? I could make sure that it wouldn't flash short flicker of the screen. Or would it be possible to add a "HAL pin" to Gmoccapy that would allow make MDI execution in every modes? Or is there any other solution to prevent short flicker of the screen?
Zdeněk
I see fight between Gmoccapy and Halui.
Gmoccapy wants "No MDI commands in MANUAL mode."
But Halui sets MDI mode for MDI_COMMAND event. Look here: https://github.com/LinuxCNC/linuxcnc/blob/46557dc6245fd685a6d44d37d21f8b1458ff28d0/src/emc/usr_intf/halui.cc#L1079 (I hope, that I understood halui code correct)
First we should define the correct LCNC behavior when changing the halui.mdi-command-XX pin
[Norbert]
To execute commands from MANUAL mode, you could combine MDI commands, first command switch to MDI then execute the MDI command you want and switch back to manual mode. You may use the corresponding Halui commands to change Modes.
It seems that halui is doing that already.
[Zdeněk] I tested that several times and I don't get that message. When I run the MDI command in manual mode, LinuxCNC switches to MDI mode, runs the command and then switch back to Manual mode. Indeed some flickering especially if you have enabled the on-screen keyboard.
[HansU] Can I ask for modify test?
[HALUI]
MDI_COMMAND = M61 Q5 G4 P0.5
MDI_COMMAND = M61 Q2 G4 P0.5
Now I tried the test without the G4 on a third PC and the problem did not appear like you. With the G4, it will show, but only for the twentieth time.
I tested it with Axis and there were not any problems.
I still don't get that message. Does is appear when you set the execute command pin or on startup?
I will make a video of the screen tonight, how to simulate the error.
Here is video: https://user-images.githubusercontent.com/96618597/235367645-18399092-23a7-4bb3-8516-3df2d3b17171.mp4
The bug behaves very randomly. On this video, the message occurs more often than in other cases. Maybe the video recorder supports this bug. Even so, you can see the randomness in the video. I usually have to click through Set and Clr far more often to get it to appear at least once. Furthermore, I found that increasing the probability of a bug occurring can be ensured by increasing the number of G and M codes in MDI_COMMAND.
The video file seems to be damaged. Please upload again.
The video is OK. It works in some player and in some player not. Can I ask you for VLC player use?
I test it in 2 PC and it works. In my mobile it works not.
Ok it works if I download it. Just didn't work in the browser. Furthermore Github is now capable of embedding videos.
Hi Hans,
did you manage to simulate this bug?
No I still couldn't reproduce it. But maybe that's because I am running it on a VM. I'll have another try on a real machine...
That's weird. I am able to simulate it on 2 PCs and 1 VM. Try reducing the CPU on the VM. I have a feeling that the error is more frequent when the CPU is more heavily loaded. Would it help if I gave you access to my PC?
the three modes, Manual, MDI and AUTO are not a feature of gmocappy, but of linuxcnc task. I believe you do not see the behavior in axis, as it does not switch the ui when task changes modes.
Try reducing the cycle time (INI setting), Default is 100 , try to set the value to 150 or 200
I could reproduce the error, but only on one PC (poor CPU Power) setting the cycle time to 150 solved the problem on my PC.
Norbert
[rene-dev]
the three modes, Manual, MDI and AUTO are not a feature of gmocappy, but of linuxcnc task. I believe you do not see the behavior in axis, as it does not switch the ui when task changes modes.
You're right. https://github.com/LinuxCNC/linuxcnc/issues/2453#issuecomment-1529031686
[Norbert]
Try reducing the cycle time (INI setting), Default is 100 , try to set the value to 150 or 200
I could reproduce the error, but only on one PC (poor CPU Power) setting the cycle time to 150 solved the problem on my PC.
Norbert
I tried 300ms and it did not help. :-(
I would like to show you one more thing about this bug. Sometimes Gmoccapy gets stuck in MDI mode. Watch this video. At time 00:00 Gmoccapy is in MANUAL mode and then after executing MDI_COMMAND it remains in MDI mode.
In this video CYCLE_TIME = 300
Gmoccapy do change the screen Design according to the MODE selection button or due to signals from LinuxCNC and so far it works as it should.
The problem seems to be related to the MDI commands. If you try with an MDI command with any movement i.e. G91 G0 Z0.001 you will not be able to reproduce the error, as LinuxCNC will emit the signal of the actual Mode but if you use a command without a Gcode Brake the signal will not be emmited.
I do not know a solution at this moment. I will finish my house building in about 3 to 6 month so agter that I will have more time to look at that kind of behavior.
Norbert
I tried:
[HALUI]
MDI_COMMAND = M61 Q5 G91 G0 Z-5.001
MDI_COMMAND = M61 Q2 G91 G0 Z-10.001
It did not help. I don't need this bug fixed immediately, but I could ask you to label this bug 2.9-must-fix, or add it to some list of things that must be fixed.
I tried to do the same with TAG 2.8.4
1) edit /home/zdenek/linuxcnc/linuxcnc-2.8.4/configs/sim/gmoccapy/gmoccapy.ini add:
[HALUI]
MDI_COMMAND = M61 Q5 G91 G0 Z-5.001
MDI_COMMAND = M61 Q2 G91 G0 Z-10.001
2) edit /home/zdenek/linuxcnc/linuxcnc-2.8.4/configs/sim/gmoccapy/gmoccapy_postgui.hal add:
net pokus-00 halui.mist.is-on halui.mdi-command-00
net pokus-01 halui.flood.is-on halui.mdi-command-01
In the LCNC 2.8 version, the SET and CLR buttons are not in the halshow, so I used the MIST and FLOOD buttons.
3) Run LCNC
The video shows that the error also appears in version 2.8.4. https://user-images.githubusercontent.com/96618597/236683209-0ad5658a-c7f5-49a8-ae24-304b169c125c.mp4
I did the test from the previous post on other versions: linuxcnc-2.7.15 - works good without message "Must be in MDI mode to issue MDI command" linuxcnc-2.8.0 - bug linuxcnc-2.8.2 - bug linuxcnc-2.8.3 - bug linuxcnc-2.8.4 - bug linuxcnc-2.9 - bug
Finding this error is challenging. Appears randomly.
Here is the terminal listing when I press the button and execute MDI_COMMAND.
LCNC 2.7.15:
Emit interp-run
3 2
Emit interp-run
LCNC 2.8.0:
MANUAL Mode
IDLE
hal status motion mode changed
LCNC 2.8.0 bug:
3 2
('MDI Mode', False)
RUN
hal status motion mode changed
Must be in MDI mode to issue MDI command
MANUAL Mode
IDLE
hal status motion mode changed
[HansU] Have you tried lowering the CPU frequency in the VM to simulate this bug? Norbert wrote that he managed to simulate it on one PC (poor CPU Power), maybe it would help you to see this bug. If you can't simulate this bug, I could send you my VM, but I won't have access to it until Tuesday.
I asked Fupe to try to look into this problem of mine.
At first I was disappointed because his feedback was that he was unable to simulate the problem on either a physical PC or a Virtual Machine.
Finally we found out that he is not using Oracle VM, but using another VM. So based on his help, we know that we need to use Oracle VM to simulate this bug.
Next, I verified that it is really necessary to reduce the CPU performance. At 100% CPU the error did not appear, at 30% the error appears stably.
No, I still don't get that message even when limiting the CPU to 30 % :/
Thank you for info.
Hello,
I am very unhappy that you are not able to simulate my bug. I made another attempt today.
I made a new Virtual Box in Oracle VM. I installed it with linuxcnc-2.8.2-buster.iso file.
In order to install this iso file, it is necessary to set at least 2 processor cores, otherwise the installation will fail.
After installing I updated:
sudo apt-get update
sudo apt-get upgrade
sudo apt-get dist-update
I started linuxcnc and opened the Gmoccapy sim.
I closed linuxcnc
edit /home/zdenek/linuxcnc/configs/sim.gmoccapy/gmoccapy.ini add:
[HALUI]
MDI_COMMAND = M61 Q5 G91 G0 Z-5.001
MDI_COMMAND = M61 Q2 G91 G0 Z-10.001
edit /home/zdenek/linuxcnc/configs/sim.gmoccapy/gmoccapy_postgui.hal add:
net attempt-00 halui.mist.is-on halui.mdi-command-00
net attempt-01 halui.flood.is-on halui.mdi-command-01
Now when I tried to simulate the bug, the bug did not appear.
Then I reduced the CPU performance and the bug appeared.
[hansu] Can I ask for another test with lower CPU performance?
Here you can download the VM file for Oracle VM VirtualBox: https://ulozto.cz/tamhle/cVRcwtDslND0#!ZGSvLmR2Awx5LmNjMJWwBGWyAQx3LGORGzkHFaqPEUEPEGZkLD== login: zdenek password: dedadeda
I did the installation twice. Once in Czech and once in English. There was no difference.
Yeah I can try a bit more around. But this also might depend on the power of teh host machine. What does your VM host have for a CPU?
Is there an answer to your question?
If you think of anything else I could try, let me know.
If you still can't simulate it, I plan to try it on an old physical PC with bad latency and mail it to you.
No nothing :( I tried down to 2 cores, 20% cpu limit where the OS on the VM is almost not responding, but no error message. I have an old PC I can install LinuxCNC on and I can also try ob my laptop which only have a Core i5. But not sure when I'll have time for this.
Can I ask you if you could try changing the INI like this:
[HALUI]
MDI_COMMAND = M61 Q5 G91 G0 Z-5.001 G4 P5
MDI_COMMAND = M61 Q2 G91 G0 Z-10.001 G4 P5
?
Stay in VM with low CPU.
I found that the more commands in MDI_COMMANDS, the more occurrences of that message.
Here it is even interesting that sometimes it ignores the G4 command and does not give any message.
If it's mainly the message that bothers you - I can write you a fix which is suppressing this message.
I would like to ask you if you could look at this bug first: https://github.com/LinuxCNC/linuxcnc/issues/2489 Maybe it's related.
I don't like to hide errors, I will check if it has functional errors as well.
I looked here: https://github.com/LinuxCNC/linuxcnc/blob/e2113f0c5b949f52c4e6ca35caa9b0395fa8a655/src/emc/usr_intf/gmoccapy/gmoccapy.py#L2769 I think that the creation of Gmoccapy did not take into account the fact that halui changes its mode by itself.
Can you find in Gmoccapy's code where the message originates from? How does it decide that an MDI command was called when in manual mode? It's odd that gmoccapy is detecting an MDI move originated from something else - I'm not sure it can. I think its is detecting the mode change using some object that is slightly out of sync with the motion controller (race condition)
I don't like to hide errors, I will check if it has functional errors as well.
I wanted to say "I can write you a temporary fix which is suppressing this message until this bug gets fixed.
Can you find in Gmoccapy's code where the message originates from?
The message is not originated from gmoccapy - it is coming from here: https://github.com/LinuxCNC/linuxcnc/blob/2.9/src/emc/task/emctaskmain.cc#L2215
I looked here:
linuxcnc/src/emc/usr_intf/gmoccapy/gmoccapy.py
Line 2769 in e2113f0
# if MDI button is not sensitive, we are not ready for MDI commands
I think that the creation of Gmoccapy did not take into account the fact that halui changes its mode by itself.
on_hal_status_mode_mdi(self, widget)
is called when the mode is changed, e.g. by halui.
Sensitive here means that they are not grayed out which would be the case when the machine is turned off.
When MDI mode is active the button has the state "checked".
Thanks Chriss for joining.
Can you find in Gmoccapy's code where the message originates from?
I think (90 percent) that Gmoccapy is causing this message, but this message is not in Gmoccapy, but here: https://github.com/LinuxCNC/linuxcnc/blob/494b5316e2f4c5f865985dbeec2d04b8ec4d5afd/src/emc/task/emctaskmain.cc#L2215
How does it decide that an MDI command was called when in manual mode?
This is probably the source of the problem. HALUI can call MDI_COMMAND at any time, in any mode. http://linuxcnc.org/docs/2.9/html/config/ini-config.html#sub:ini:sec:halui
HALUI set MDI mode here: https://github.com/LinuxCNC/linuxcnc/blob/46557dc6245fd685a6d44d37d21f8b1458ff28d0/src/emc/usr_intf/halui.cc#L1079
After execute commands HALUI set old mode here: https://github.com/LinuxCNC/linuxcnc/blob/46557dc6245fd685a6d44d37d21f8b1458ff28d0/src/emc/usr_intf/halui.cc#L2009
I think Gmoccapy doesn't assume HALUI functionality and fights it.
[HansU]
I wanted to say "I can write you a temporary fix which is suppressing this message until this bug gets fixed.
I am doing a retrofit for a friend who is not pressed for time. You don't have to do some temporary solution. Thank you for offer.
[HansU]
Sensitive here means that they are not grayed out which would be the case when the machine is turned off.
Not only the case when the machine is turned off. Look: https://github.com/LinuxCNC/linuxcnc/assets/96618597/b0a4c63e-fb1f-4500-a0d6-75ed91832b21
It's sounding like the error is not with gmoccapy but maybe with halui. I can't see where halui confirms the mode switch happened before issuing the MDI command.
I wonder if on most systems the mode switch is fast enough the there is no problem. on a low powered machine using a high resource gui (gmoccapy) it shows the problem.
Could test by adding a delay to halui right after the mode switch and see if that fixes it.
@zz912 Suggestion to record the screen: https://github.com/phw/peek - you can install it simply via the package repositories.
Could test by adding a delay to halui right after the mode switch and see if that fixes it.
I probably dont know how: I tried:
I dont know function esleep, but the esleep is here: https://github.com/LinuxCNC/linuxcnc/blob/494b5316e2f4c5f865985dbeec2d04b8ec4d5afd/src/emc/usr_intf/halui.cc#L2369
I tried esleep and sleep, but it didnot help.
Need to put it two lines lower. As you put it, the sleep command is only called if there was an error. But it see that is the code that halui uses to confirms the mode switch - I just didn't recognize it.
but still would be interesting to see if the delay helps.
At the moment esleep really works. The esleep function parameter is in seconds and I see that I have to wait 1s + 1s for everything to happen. Unfortunately, this did not fix the bug.
[c-morley]
It's sounding like the error is not with gmoccapy but maybe with halui.
I don't know if HALUI or GMOCCAPY is causing this particular error. Anyway, we found out that Gmoccapy doesn't expect to use halui MDI_COMMANDS.
Therefore, I think we should temporarily warn Gmoccapy users not to use halui MDI_COMMAND. Here: http://linuxcnc.org/docs/stable/html/gui/gmoccapy.html#_known_problems http://linuxcnc.org/docs/2.9/html/gui/gmoccapy.html#_known_problems
Gmoccapy does not support halui MDI_COMMAND, do not use them. The behavior of halui MDI_COMMAND is random.
This Gmoccapy fix won't just be about fixing a line or three. It will be necessary to integrate into Gmoccapy the idea that halui can change modes. Gmoccapy does not expect this at the moment and points it out in the comments.
I need to be in manual mode and HAL signals to run halui MDI_COMMAND. Ideally, I would like to be able to run my halui MDI_COMMAND only in manual mode.
Gmoccapy tries to disable MDI_COMMANDS if you are not in MDI mode. I suggest to rethink the philosophy and disable MDI_COMMANDS only when you are in service and AUTO mode. In order for this to be feasible, it will be necessary to add the command "disable_halui_commands" to the halui module.
The "self.command.abort()" command is often used in Gmoccapy. If we don't expect interventions from the halui module, then this command does not matter. Unfortunately, the halui module does intercepts, so the "self.command.abort()" command should be handled with care. It can happen that the halui module runs MDI_COMMAND and Gmoccapy accidentally disables it in the middle of the run and doesn't even warn about it with any message. This is very dangerous. It has happened to me a few times that the M6 or M61 command was executed but the tool corrections were not set correctly. Unfortunately, I'm having trouble simulating these random problems in general. I'm failing to simulate this particular dangerous command.
Hello every body,
I found the bug in source code !!!!!!!!!!!!! I am so happy!!!!!!!!
Chris was very close to solving the problem. Needed to add sleep before returning mode. https://github.com/LinuxCNC/linuxcnc/blob/beacb3c0572eee72d0faf80cb736610059c37768/src/emc/usr_intf/halui.cc#L2133
When I first tried esleep, I tried it with a value of 0.1s = 100ms. The probability of occurrence of the message has decreased, but the message has not disappeared. By gradually increasing the value, I reached the value of 0.4s = 400ms. At this value, it passed the 10 minute click test without a bug.
I would like to ask you to suggest a better solution than the "sleep process". I would like something "if MDI_COMMAND == finished { continue }"
This condition may not work correctly:
if (emcStatus->status == 1) { //which seems to have finished
Hello,
Today I wanted to continue looking for the bug. I started by downloading the current 2.9 branch. I found that the bug is gone.
I looked up what fixed the bug.
I found that this commit https://github.com/LinuxCNC/linuxcnc/commit/bdb442a3dfd451ab482b9950fb6b7e507dbdb7a6 solved the problem.
I would like to thank SebKuzminsky for fixing the cause of the bug and others for their help.
Great!
I apologize for the confusion. This bug was not repaired. Bug behavior is random and malicious. I made over 20 attempts and she didn't show up. So I documented it. Then I checked whether this commit affected other bugs as well. I figured out that this bug just didn't appear.
So today I devoted myself to finding this bug again.
I modified this part of the code: https://github.com/LinuxCNC/linuxcnc/blob/beacb3c0572eee72d0faf80cb736610059c37768/src/emc/usr_intf/halui.cc#L2133
if (halui_sent_mdi) { // we have an ongoing MDI command
int array[10];
for (int i = 0; i < 10; i++){
array[i] = emcStatus->status;
esleep(0.001); //sleep for a while
}
if (emcStatus->status == 1) { //which seems to have finished
halui_sent_mdi = 0;
switch (halui_old_mode) {
case EMC_TASK_MODE_MANUAL:
fprintf(stderr,"HAF HAF MANUAL MODE will be set\n");
sendManual();
fprintf(stderr,"HAF HAF MANUAL MODE was set\n");
break;
case EMC_TASK_MODE_MDI: break;
case EMC_TASK_MODE_AUTO: sendAuto();break;
default: sendManual();break;
}
}
fprintf(stderr,"\nHAF HAF emcStatus->status:\n");
for (int i = 0; i < 10; i++){
fprintf(stderr,"HAF HAF %d\n",array[i]);
}
fprintf(stderr,"\n");
}
Result without bug:
3 2
HAF HAF emcStatus->status:
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF emcStatus->status:
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF emcStatus->status:
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF emcStatus->status:
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF MANUAL MODE will be set
HAF HAF MANUAL MODE was set
HAF HAF emcStatus->status:
HAF HAF 1
HAF HAF 1
HAF HAF 1
HAF HAF 1
HAF HAF 1
HAF HAF 1
HAF HAF 1
HAF HAF 1
HAF HAF 1
HAF HAF 1
Result with bug:
3 2
HAF HAF emcStatus->status:
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF emcStatus->status:
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF emcStatus->status:
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF 2
HAF HAF MANUAL MODE will be set
Must be in MDI mode to issue MDI command
HAF HAF MANUAL MODE was set
HAF HAF emcStatus->status:
HAF HAF 1
HAF HAF 1
HAF HAF 1
HAF HAF 1
HAF HAF 1
HAF HAF 1
HAF HAF 1
HAF HAF 1
HAF HAF 1
HAF HAF 1
1) So far it looks like the bug occurs after executing this command: https://github.com/LinuxCNC/linuxcnc/blob/beacb3c0572eee72d0faf80cb736610059c37768/src/emc/usr_intf/halui.cc#L2135 2) The emcStatus->status value is stable. 3) The emcStatus->status value will change to 1 very soon.
I still can't think of a way to solve it. I don't like solutions with a large esleep value.
[zz912]
I still can't think of a way to solve it. I don't like solutions with a large esleep value.
It sound like there is a concurrency/race condition where two tasks need to be done in a specific sequence which is currently not guaranteed by the code. Adding the sleep to the task that need to be done last will hide the problem, as it increase the chance of the first task being done before the last task is running.
Do you know whihc tasks / code paths are involved? Can some locking mechanism (like mutual exlusion / mutex) be added to ensure a predictable ordering?
-- Happy hacking Petter Reinholdtsen
Hello Petterreinholdtsen,
Thanks for joining.
I'm currently in a situation where I can't figure out what and how "emcStatus->status" is controls. I'm just an arduino guy. I am not programmer. :-(
I think there is a locking system, but "emcStatus->status" unlocks them very quickly.
I use RIP LCNC Branche 2.9
Simulate the problem: 1) edit /home/user/linuxcnc/linuxcnc-2.9/configs/sim/gmoccapy/gmoccapy.ini add:
2) Run LCNC
3) This error is random, so it may not appear the first time. Constantly switch to MDI mode and enable/disable HAL pins halui.mdi-command-00 and halui.mdi-command-01
MDI commands run fine, but sometimes this message pops up.
I have more sophisticated commands on the real machine. I was in MDI-mode the whole time when I tuned them and never had this problem. However, for normal use of my commands, I want to be in JOG mode and that's where the message appears.