Open treib opened 3 years ago
Thanks, can you make a P.R.? it will be much easier to evaluate in that form.
I think it will help to test and review if we break these down in to separate PRs. Let's start with the race condition first, because I have recently done a deep dive into protocol_exec_rt_system, so it is reasonably fresh in my mind.
You may have noticed that the variable cycle_stop is separate from the bitfields in sys_rt_exec_state. That was not always the case. We had a race problem with .bit.CycleStop . We fixed it by making cycle_stop into a separate boolean, which can be set atomically with a single byte store, not requiring a read/or/write sequence. A similar change is likely to work for .bit.CycleStart => cycle_start . The one disadvantage of such a change is that you can no longer test all the bits in one go with rt_exec_state.value != 0
; instead you would have to say if (rt_exec_state.value != 0 || cycle_stop || cycle_stop) {
. That is probably not a problem, considering the advantages you get by being able to set and clear the flag with a single store compared to the explicitly-locked read/modify/write sequences.
Hi Mitch, thanks - I didn't know the history of cycle_stop. That's helpful to know.
I was just just about to submit a PR for the getter method I used, but I'm thinking that breaking out into separate bools might be the better approach. It involves touching a lot more code, but what concerns me is whether there might ever be a need for a separate task (not Serial.cpp or the main program) to monitor or change those flags. It doesn't seem any existing code falls into that scenario (CoreXY.cpp uses the flags in its own loop for homing, but is called on the main program). Still, it seems like it could maybe happen down the road.
I'll put together a separate PR for breaking out into separate bools and we can look at whether that's the best approach.
Ideally the flags are only tested/cleared in one place, namely protocol_exec_rt_system. Some of them are set from several places - 6 places for cycle_start . I don't think that setting a flag that is already set is a problem.
The "only one test/clear place" rule is violated by the homing code in Limits.cpp and the copy of that code in CoreXY.cpp. That is troublesome - and indeed it stalled my attempt to convert protocol_exec_rt_system into a proper event-queue-driven state machine. But it doesn't involve cycle_start.
Thanks! - I have broken these into two pull requests: #716 and #718
My fix was to flush the serial buffer and the line input buffer on receipt of a jog cancel command. Code can be found in this patch.
Hello everyone -
On the latest versions of Grbl_Esp32 (Ver 1.3a Date 20201212) and UGS (2.0.7), I'm seeing a jogging issue where releasing from a continuous jog sometimes results in the jog continuing and hitting the endstop. Anyone else see this behavior? Mainly it happens when the jog feed rate in UGS is set much higher than the axis max velocity.
I’m new to the codebase, but I looked into it and made a couple changes that fixed it for me. I am not comfortable the changes I made are the best solution, though, so rather than doing a pull request I wanted to explain what I found in case it would be helpful to anyone.
You are welcome to use parts or all of it, or I can refine it if you have suggestions on how it might better fit into the project.
My changes are in the fork: https://github.com/treib/Grbl_Esp32
I believe I’ve tracked it down to a couple things:
Race condition setting sys_rt_exec_state
A race condition between the Serial.cpp task changing the
sys_rt_exec_state
global and the main program. I suspect the main hit point is inprotocol_auto_cycle_start()
where it sets thecycleStart
bit, but likely any place where a bit is being set in the main program is risky since I suspect the underlying bitwise OR and store is non-atomic.This is a rare hit, maybe 1% failure rate, but I observed it in the real world. Strategically placed debug strings confirmed that the command was received, but the bit ultimately was lost.
I tested a fix where Serial.cpp stores a task-local version of
sys_rt_exec_state
, and it is fetched at the start ofprotocol_exec_rt_system()
using a getter function protected withvTaskEnterCritical()
. This works, but could require modification of custom modules to call this explicit getter rather than assume these values will change on their own. (I’m thinking of modules like CoreXY.cpp).Cancel/hold misses move that is in-flight in mc_line()
Another unrelated cause of the jogging problem appears to be more of a grbl issue: When a
jogCancel
orfeedHold
arrives, the planner is emptied and gcode parser state resets to current position, which is good. But a next jog command could already be in-flight at this point, waiting inmc_line()
as the machine decelerates, and with target position determined based on gcode that has been dropped due to the hold/cancel.UGS causes this easily by sending a continuing series of incremental jog commands while holding down a jog direction button. These accumulate the target position to a large number, and since a
jogCancel
orfeedHold
duringState::Jog
results in going back toState::Idle
, that stuck in-flight move gets added to the planner with the large target value intact. The machine then moves and hits the endstop. This is much easier to reproduce (happens maybe 30% of the time for me), and is especially obvious and problematic when UGS is set to a jog feedrate much larger than the axis can support (since that in-flight move gets a much larger and easily out-of-bounds target position).My simple fix for this is to create a global variable that tracks an in-flight command in
mc_line()
. Inprotocol_exec_rt_suspend()
I look forsys.suspend.bit.jogCancel
being true, and if the in-flight command is a jog, I set a flag that mc_line() will notice and not add it to the planner. I added anis_jog
flag toplan_line_data_t
to track whether it’s a jog, and a return bool frommc_line()
to signal it dropped the command so thatjog_execute()
can signal back togc_execute_line()
that it was dropped so it can skip the updating ofgc_state.position
.I did not touch the kinematic path to
mc_line()
, so I am not sure if those could still have issues.My main concern is this will only catch these in-flight moves if
protocol_execute_realtime()
is called when the move has enteredmc_line()
. Perhaps some scenarios or custom code could callprotocol_execute_realtime()
after a gcode line was parsed but before getting tomc_line()
. But I guess in that case it is no different than how it behaves without the fix.UGS is a little unreliable too
I should mention that I’ve also seen UGS not send the feedHold when releasing a jog button. My solution to this problem is to stop using UGS. :)
Anyway, I hope some of this will be helpful. Thanks for all your hard work on this great project!
Phil