MaslowCNC / Firmware

Maslow Firmware
GNU General Public License v3.0
261 stars 136 forks source link

Initial version of a stalled/unhooked axis detector #393

Closed thormj closed 6 years ago

thormj commented 6 years ago

For issue #378, but this needs testing (I wrote it on a plane flight and I won't be back to carving until Mar 6).

Algorithm (and if this is not clear from the patch, let me know and I'll throw in some more comments):

  1. When calculating the position PID for each axis: if it moved since last time, increment moved, if not, increment stalled.
  2. If an axis is detached, clear both variables.
  3. When the report runs, examine all axes; if it has stalled for >5 cycles (so 500ms), prepare a message: 3b. If it's moved before, say it's stalled. If it hasn't moved before, say it's UNHK'd
  4. If I have a message from step 3, pause() the machine (which hopefully detaches the axes which will clear the error)

Not sure about step 4 happening fast enough... maybe I need to put a "if not paused" around the block in report.cpp?

davidelang commented 6 years ago

what if an axis should not be moving (based on the speed/angle of the current move)?

thormj commented 6 years ago

I was assuming that was being handled by the if(_pidController.Compute()) statement, but that might doesn't look correct. I put a if(_pidOutput) bit in there (so if 0 velocity is called for, reset things -- I was assuming that if 0 was called for, then Compute would be false, but that doesn't make a whole lotta sense - if that were true how would you stop an axis?)

krkeegan commented 6 years ago

Couldn't this trigger a stall warning if the movements requested were extremely small? When we are station keeping, the pidoutput may fluctuate rapidly positive and negative to try and keep holding the exact position.

thormj commented 6 years ago

I tried to mitigate that with thresholds ("you have to have indicated a stall for the last X counts of the report all"), but that still possible... is there a "distance count" (preferable in encoder counts -- that should be used to indicate "station keeping" vs "move me" -- I'll add it /change it in Axis.cpp:116 and then we'd be sure (or Axis.h -- I chose StallSteps and StallTimeout quite arbitrarily)

davidelang commented 6 years ago

I'm not sure that looking for no motion is the right thing to be doing, the current problem of a hook at the end of a line at the top of the machine is a classic example of the machine not keeping up (what we would consider the equivalent of a stall), but motion was happening, just not enough of it

So I think we need to be looking for too much error, not no motion

davidelang commented 6 years ago

or another thing to look for, maxing out the motor power and the PID loop still trying to increase it.

krkeegan commented 6 years ago

I think we had talked about using error before and that seems a bit more straight forward to me. The only issue I see is that if the max_error is a fixed amount, small movements would not trigger the warning, at least not immediately. The max_error could scale but then it is a question of what does it scale with?

I think this is a case where a simple solution fixes many of our problems. But we can't fix all of the issues without a complex solution.

krkeegan commented 6 years ago

@thormj thank you for making those corrections.

If this works as advertised and produces no side effects I think this is good to commit. I can keep backseat driving about hypothetical alternative ways to handle this, but I lack the time to actually do the coding, so that doesn't seem fair.

thormj commented 6 years ago

I might've been trying to shoot bigger game than I should've been. The thing that I wanted was a way to burp and tell me "hey bozo, plug in your Z axis!", but I figured that that would've been the same algorithm for detecting a stall (I moved before, but for some reason, I cannot move now), so I said "if I've moved before... I'm plugged in; say that I've stalled instead."

It doesn't detect "I'm not keeping up" very well, and I'm not sure if will actually detect a stall or not (if during a stall, the encoder bounces more than 10 clicks per 100 ms, it not-quite stalled by my code's reckoning.

Maybe for "not keeping up" you'd want an error and all stop ("Hey idgit - plug in yo' z axis"), but I think you'd really rather start tweaking the other velocities in a coordinated move to straighten it back out and set a flag that "your Maslow is cutting slow because it's binding" (not good for cutters [but would signalling a full stop without backing off the material be good either?], but at least the cut will be good and straight).

Maybe if vel_error>2000, cut velocities of all axes to 1/2, and if it gets to within 100, go back to full speed? But I wouldn't try to make that kind of change away from maslow as I have no idea what the "usual" and the "OMG-it's sticking" numbers would be.

davidelang commented 6 years ago

On Wed, 21 Feb 2018, Kevin Robert Keegan wrote:

I think we had talked about using error before and that seems a bit more straight forward to me. The only issue I see is that if the max_error is a fixed amount, small movements would not trigger the warning, at least not immediately. The max_error could scale but then it is a question of what does it scale with?

I think this is a case where a simple solution fixes many of our problems. But we can't fix all of the issues without a complex solution.

I don't think most of our problems are with no movement.

That's why I was suggesting looking for exceeding +-255 speed. Even if it's a small movement, if we aren't moving, the PID loop should keep calling for more speed up past the limit, because the error isn't being reduced.

krkeegan commented 6 years ago

i think the speed is currently capped at 255 (the pid loop will never output higher), but we could remove that cap and catch anything that high as an error. It could certainly work, catching an unplugged motor wouldn't be instantaneous, but the amount of error and time would be minimal.

blurfl commented 6 years ago

Wouldn’t overshoot on a poorly tuned PID loop cause this error?

davidelang commented 6 years ago

only if the overshoot is bad enough to cause 100% power in reverse to be applied, and if that does happen, I don't think it's a bad thing to get a notice of it.

The notice should be something like "the machine was unable to move at the desired speed" (or something along those lines), not "motor stalled" or something like that.

David Lang

blurfl commented 6 years ago

How much overshoot is reasonable? For how long?

davidelang commented 6 years ago

ideally we should not overshoot (it means cutting areas we aren't supposed to cut)

currently some overshoot is going to happen because we do not have any motion planning to slow us down. I don't know what this amounts to, but it's not enough to notice in the resulting cuts.

k

krkeegan commented 6 years ago

We have to be careful about what we mean when we say overshoot.

Overshoot in reference to the PID loops means requesting either an RPM or voltage value higher than necessary. The issue is we don't know what is necessary.

@blurfl is correct, a badly tuned PID loop would cause this type of design to fire. But I agree with @davidelang that above some threshold, this is probably what we want anyways.

I think the only way to pick that threshold is by some empirical testing. Of course the problem there is that a different design may result in a different outcome.

This proposal is certainly simpler and cleaner than the one currently designed

davidelang commented 6 years ago

I'm proposing that the first-try threshold be the max power to the motors. If we are running the motors at full power and we think (through the PID loop) that we need to go even faster, that seems like a clear problem

either:

  1. we aren't moving
  2. we aren't keeping up with the desired movement
  3. we have passed (by a significant amount)where we were supposed to stop and are reversing in an attempt to get back where we are supposed to be.

I don't think 3 is common (but real-world experimentation will tell us)

I am hoping that this test will trigger for the heavy-sled-top-of-machine problem that gero ran into.

David Lang

krkeegan commented 6 years ago

It will be interesting to see what happens. My guess is that when starting from a standstill, that we hit your 2nd option (not keeping up) and max out often. Here is a graph I made a while back

https://discourse-cdn-sjc2.com/standard11/uploads/maslowcnc/original/2X/7/7c5e44c3703fa92a1d624342b8fd5b4ee35c8cdc.png

It looks like it may not hit 255, but that could just be because the sample rate of the graph is well below the PID loop frequency.

davidelang commented 6 years ago

yep, let's see what happens.

If this isn't hit under normal operations, this would be a very useful check to have.

if it is, then we need to implement acceleration planning before it is as useful. k

krkeegan commented 6 years ago

So I think @davidelang is right. I made a simple test branch that prints a warning to the serial stream if

255 < speed < -255

and this condition is maintained for 20 milliseconds (meaning 3 consecutive PId cycles).

You can try it here:

https://github.com/krkeegan/Firmware/tree/OverSpeedWarning

It works. It shows up when i unplug a motor. And it also prints on the areas on the work surface where my motors are not able to maintain the 1000mm/min max speed. (top corners mostly, I don't know if this is unique to my machine or if we have our max speed set a little high).

In this basic form it doesn't understand a stall vs unhooked, but I don't know if that is necessary.

Anyways, others should play around with this and see if this works for them. We can implement it as a warning very easily, if we want to actually trip an alarm state that still needs more work to enable alarms to work smoothly.

blurfl commented 6 years ago

@krkeegan, are you considering a PR with the overspeed warning in the simple form you linked? I could see making it switched by verboseDebug, and maybe making the trigger something like (LOOPINTERVAL * 2)/1000 to scale with changes to the PID loop.

davidelang commented 6 years ago

the current check to alert if at max for more than 3 cycles is a good starting point.

I think 'alert if at max and PID loop thinks we need to speed up' will work as well, and doesn't have the complexity of remembering the last few cycles (or worrying about how far apart those cycles are)

but either one is going to give false positives until we get acceleration planning in place.

davidelang commented 6 years ago

also check out this Marlin issue https://forums.maslowcnc.com/t/calibration-moving-one-motor-wrong-direction-in-ground-control-v1-08/2590

BarbourSmith commented 6 years ago

I would like to get this into 1.09. I'm a little hesitant because we accidentally put pretty significant bugs in 1.06 (crashed when velocity couldn't be reported) and 1.07 (crashed when setting the chain lengths) so we haven't had a really stable release in like a month, but it sounds like there is a strong community desire to see this feature added.

How do I test that it is working? When I unplug a motor I don't see any sort of warning.

blurfl commented 6 years ago

I think that the change @krkeegan wrote here is simpler and easier to test. It posts a warning message if the motor becomes disconnected or when the PID loop asks for more than 100% effort.

BarbourSmith commented 6 years ago

Excellent!

@krkeegan how do you feel about making a PR with those changes?

krkeegan commented 6 years ago

I can do that, this is very simple and is probably a good baby step, since it only produces a warning for the user.

Once we get it dialed in, and other improvements, it could actually take action (ie stop) if a fault happened.

BarbourSmith commented 6 years ago

I'm going to close this one as added in #410 👍 👍