gin66 / FastAccelStepper

A high speed stepper library for Atmega 168/328p (nano), Atmega32u4, Atmega 2560, ESP32, ESP32S2, ESP32S3, ESP32C3 and Atmel SAM Due
MIT License
282 stars 67 forks source link

When quicky changing directions on a 2 steppers setup sometimes one (or both) steppers can end reversed #250

Closed SHWotever closed 1 month ago

SHWotever commented 2 months ago

Hi !

I'm trying to drive a two stepper setup based on a "noisy" input quickly changing from CW to CCW sometimes doing large movements, sometimes very short. The direction can be changed up to every 10ms. It could be considered to be close to a random noise.

Currently the settings are the following :

The platform is a good old arduino nano v3, I tested both the official release available here as a zip : https://www.ardu-badge.com/FastAccelStepper And the current from repo. I'm using arduino 1.8.19 if somehow that matters (the latest prior 2.X)

const int stepper1_pulsePin = 9; // Must be either 9 or 10 const int stepper1_directionPin = 3; const int stepper1_enablePin = 2; const int stepper1_reverseDirection = true;

const int stepper2_pulsePin = 10; // Must be either 9 or 10 const int stepper2_directionPin = 5; const int stepper2_enablePin = 4; const int stepper2_reverseDirection = false;

speedHz = 28000hz acceleration = 30000;

I'm only using the moveTo command as the goal is to go as fast as possible to the target position, and if there not enough time simply give up and try to follow as best as possible the target position either in one direction or the other.

When the issue occurs, it ends to keep reversed for a long time (even when switching directions again and again) and sometimes recover. I suspect it to recover when the movement finally settles enough to get a full stop.

I've tried to dissect the logic behind StepperISR, to hopefully give some insights,

Originally I thought about a race condition on the digitalwrite available in StepperISR between both steppers :

My eye got caught by the way the pin direction is transmitted to the interrupt : it's a toggle flag. So if somehow in such race conditions it gets toggled twice, or a toggle is missed, the actual direction pin state could (?) maybe get reversed for a long time and this miss survive many later toggles . I can't get my finger on which could get a missed toggle, but I somehow feel it's the cause.

For Science I also tried to always trigger a stop if the direction is to be changed, and wait for it to be stopped (!isRunning) to initiate the new movement with moveTo, but it ended having the same issue from time to time (which would ruin my theory about a full stop recovering the situation)

Maybe would you have some insights ? The library is super fast and convenient, but this really got me lost :D I guess this "almost random" direction changes is clearly not common, that would explain why an issue could be hidden.

SHWotever commented 2 months ago

Edit : Just some extra infos which could matter :

gin66 commented 2 months ago

There may be a weak spot in the stepper ISR. Could you please check, if in your application, that code branch is executed ? For example, just set there a global variable to a specific value and monitor that variable. Idea is, if that new command contains a direction pin toggle, it may get lost.

For the stopped motor case: If the motor is stopped (!isRunning()), then the direction pin is set instead of using a toggle. So it should work as expected from a stopped motor. Just for the avr implementation, it could be that isRunning() returns false, but there still can happen another interrupt. That‘s the purpose of that branch to recover from this situation. I would have expected, that this specific case is handled well….but without test it is just an unproven hypothesis.

In addition, please try to set the direction change delay parameter to 1 in:

setDirectionPin(uint8_t dirPin, bool dirHighCountsUp = true,
                       uint16_t dir_change_delay_us = 0);

Then please check, if the behavior is gone.

SHWotever commented 2 months ago

Hi ! Thanks a lot ! I will check and report,

About the direction switch delay I did try it with various delays from 25 to 400us, it's a very interesting feature as it helps skipping lot of quick direction changes and makes it quieter in my purpose, but unfortunately it didn't helped.

In the meanwhile I also tried again this morning with slower accelerations, slower max speeds, I also tried a 400ms delay between every calls to moveto (as in my understanding the processing of direction and queue is not called so fast, so I wanted to ensure it was processed after each update.

I have indeed seen those two separate ways to set the direction pin, either the toggle method executed by the interrupt routine, or the direct assignment when not running. I need to get more used to the logic behind to see how it could fail :D

None of those have helped, which would make me think about the issue being somehow between the queue processing and the interrupt.

I can also confirm it can happen on any of the two motors, and sometimes both at the exact same time (at this date they receive the same moveto commands). They don't need to be totally in sync, but it helps to see the issue in a very visible way.

gin66 commented 2 months ago

Thanks for the feedback. The issue is closed by you. Is this by accident or have you identified/solved the root cause/issue ?

SHWotever commented 2 months ago

Hi ! Sorry it was a bad manipulation!

gin66 commented 2 months ago

With a 400ms delay, the issue has still occurred ?

SHWotever commented 2 months ago

Yes with the delay it was indeed still failing randomly. But at the end that will be a parameter I will use, due to the noisy nature of my application it's a good trade for reducing noise on direction changes, I need absolute positioning but not absolutely instant position :D

I will try to monitor the code branch you pointed this morning.

Another test I did last week after trying to understand better the library logic, was to avoid a single step change as I've seen that one step was a very specific case, so I Added a minimum value between two positions (I tried a minimum of 20step from memory) with no luck, but I guess that this code branch is about one step remaining, not overall one step move. But indeed just by reading it could be a good candidate for a direction change miss. I wish I could be of better help :D but I'm fighting to catch the exact events relations between the isr and the queue processing.

From my understanding moveto exclusively manipulates the queue end, and do not enqueue moves (so without queue overflow risks) . So I already excluded this possibility.

Overall I did lot of tests and combinations(always around the moveto command) speed, pauses, accelerations. I was hoping to come with a PR instead of a call for help :D

If I understand the potential breach you are pointing : if a direction change was due on this one step remaining branch, it would be missed, and if somehow the movement is started again at this exact time then it would keep it out of sync until a complete stop.

Since both motors are for now fed with the same instructions, I can see very visually when it occurs, and my conviction is that it's on small moves. And once a large move is triggered in the swapped direction it can get off from 2000 or 3000 steps.

I was using previously a very old version of your library, and I think it was already the case (probably less often as the whole code around was significantly slower). I've always put it on an overtorque issue. Just for illustration instead of keeping the application mysterious (even if I tried to focus on the issue triggers and symptoms) , https://youtu.be/7ao_srZlhNc?si=hs74gvI-pad-QFn5 Overall it's a super simple serial to absolute position sketch

For all my current tests the motors are unloaded, but since I could see it very visually the direction swap, load for this case is not a factor

I will try to log it first and try to patch it if I can. Fingers crossed!

SHWotever commented 2 months ago

Logging result So I did the branch test and I confirm it got triggered : https://github.com/gin66/FastAccelStepper/blob/c8b69399520552cf7cfaaf4600125a69b0a7cd5f/src/StepperISR_avr.cpp#L164

I had to grab it from the queue as I could not easily insert a "gobal bool" that easily.

image

It's going really fast, but based on the timings the first occurence of the issue seems to match the first time it enters this case.

Fix attempt 1, fail

I was expecting the missed direction change to be caused by the early exit, so I tried to apply the direction change in the early exit if, but it's the same (sorry for the poor indentation :D ) : image

Fix attempt 2, success ? Then I tried my chance swapping direction earlier, this seems much better, I could run a few minutes without it occuring, but I feel this could hide other race condition :

My guess is that there is no quick exit in the first _prepareForStop branch e->steps > 0 is false But the second case does not enter neither and won't apply the direction change : rp != fas_queue_##CHANNEL.next_write_idx I'm not sure if it will be always the case or if this dirty fix could have also some weaknesses in the opposite direction, or if i'm not creating a one step move in the wrong direction every time it enters in the e->steps > 0 is true branch.

In all cases it seems to solve the issue, I'm just unsure if i'm introducing a single step in the wrong direction since I'm playing with a 5000 steps range on a 80° output : it's pretty high res, and the gearbox have some backslash which will hide a few steps being off :D

image

Fix attempt 3, is there a step off ? Edit : I tried a secondary version applying the direction change after this last step, Not sure which version is the best, applying it after or before ? image

In all cases you pointed the correct location. I can run a few minutes without visible issues , I will try a long run in the day. I'm just at doubt about one step being in the wrong direction, and potential side effects in case two cases applying the direction changes occurs (if that is possible ?)

gin66 commented 2 months ago

Great analysis and this looks promising. Unfortunately the test suite is very sensitive to code changes, which impacts the timing and as such throws a lot of false positives. So I need to check your patch on my local pc first and fix the false positives. I doubt, this can be done before the weekend.

Sorry for that.

SHWotever commented 2 months ago

Really no problems ;) Unfortunately I won't be available next week if you need some extra tests feedback. I will try to check as far as I can how well it performs now in real conditions ;) (minus this little doubt about one step being lost, no visual observation (on my hardware) can really guarantee it :D

SHWotever commented 2 months ago

Just a follow-up as promised : after 4 days and lot of tests in real conditions. Not a single wrong direction issue anymore. I've not noticed any steps loss neither. I'm using the latest master version with the patch I submitted. I can only say one thing : your library is amazing ;) absolute position, relative, dynamic acceleration and speed is nicely handled and butter smooth !

I've been using your library since 2 years and it have always beaten everything else I could test in terms of API and speed ;)

gin66 commented 2 months ago

The new test for issue 250 is failing with the patch. So I have moved the patch one level up, but still no success. Need to continue to work on it.

gin66 commented 2 months ago

Apparently, FastAccelStepper cannot cope well with interrupt blocks >20us at high speed. So the test failures on my side were not related to the direction pin changes.

Not sure, if this still works well on your hardware. As I expect no issues, I have made another formal release v0.30.12

SHWotever commented 2 months ago

Hi !

Thanks a lot for your efforts, I see indeed the patch was integrated as I submitted it. In my case I've seen no issues since, I could test it a full week, and everytime it felt to have kept perfectly the centered (which is the "idle" position when finished and is a good indicator of a drift).

gin66 commented 2 months ago

Could you please test the latest master branch ? I have reworked the whole ISR to make it more robust against longer blocks of the interrupt. This allows to nearly eliminate the problematic code branch with your patch.

Robust means in this context: no steps lost, but a 4ms pause could occur between two steps.

SHWotever commented 2 months ago

For sure, I will be able to test it next week ;)

SHWotever commented 1 month ago

As promised, a feedback ;) I took the latest master version and let it run for 2 hours in my usage conditions. Everything was still aligned. So after two hours I can conclude with good confidence that there is no missed steps or directions issue. Thanks a lot !