Acceleration > 100 causes sporadic speeds

phord commented 10 years ago

When ACCELERATION is 10, we seem able to accelerate up to our target speed without fail. When ACCELERATION is 100 or more, our maximum speed is much lower than the target speed and is spiky.

Here's a graph of our measured velocity for this script at several different ACCELERATION values:

G1 X30 F300
G1 X60 F300
G1 X90 F300
G1 X120 F300
G1 X150 F300
G1 X180 F300
M2

When lookahead is turned on, it seems to work ok for moves after the 2nd one. But the target speed is still unreached and the moves are still spiky.

phord commented 10 years ago

The problem bisects to 33a936cf0779e5fefad676ada57edcd6dfa47bbb.

In origin/master, the same graph above looks good. The target speed is not always reached, but it is much closer.

This seems related to #67, but I cannot be certain because that issue is not reproducible on its own, yet.

Traumflug commented 10 years ago

The problem bisects to 33a936c.

This link doesn't work. A local commit of yours?

All this simulation stuff opens new horizons on the workings of the firmware. Amazing!

Traumflug commented 10 years ago

I see you've put the issue-69 branch on top of issue-68, so you have "precision" rampup_steps calculation already. For all manual verifications, I found rampup_steps to be on the spot. So far I tried only 50 mm/s2.

phord commented 10 years ago

Oh yeah, duh. That's the commit I cherry-picked onto the simulator code for testing.

Try this one. 93b58d816115ece5d8c4cc2784a3de521c4b25ed

Traumflug commented 10 years ago

Hmm. @zuzuf talked about inaccuracies with this simplification (dda.c, about line 860) for low n values in issue #43:

    // Explicit formula: c0 * (sqrt(n + 1) - sqrt(n)),
    // approximation here: c0 * (1 / (2 * sqrt(n))).
    move_c = ((C0 >> 8) * int_inv_sqrt(dda->n)) >> 5;

Higher accelerations have a lower number of acceleration steps for the same speed, so low n values are present. Doesn't explain why it gets better with lookahead-joined moves, though.

phord commented 10 years ago

I wouldn't say it gets better.

Traumflug commented 10 years ago

Uhm, I just pasted some of the existing code to give you a pointer. No surprise it works the same way :-)

I see you run at 50 steps/mm. Then accelerating at 410 mm/s2 to 300 mm/min = 5 mm/s is

s = v^2 / a = 5 * 5 / 410 = 0.061 mm = 3 steps           <==>
int_inv_sqrt(dda->n) = 0x1000 / sqrt(3) = 2364           <==>
C0 * int_inv_sqrt(dda->n) = 20'000'000 / sqrt(50 * 410 / 2000) * 2364 =
= 20'000'000 / 3.2016 * 2364 = 14'767'616'191 = overflow (34 bits).

Accelerating at 10 mm/s2:

s = v^2 / a = 5 * 5 / 10 = 2.5 mm = 125 steps            <==>
int_inv_sqrt(dda->n) = 0x1000 / sqrt(125) = 366          <==>
C0 * int_inv_sqrt(dda->n) = 20'000'000 / sqrt(50 * 10 / 2000) * 366 =
= 20'000'000 / 0.5 * 366 = 14'640'000'000 = again, overflow (34 bits).

Compare this to what I run (1280 steps/mm, 50 mm/s^2):

s = v^2 / a = 5 * 5 / 50 = 0.5 mm = 640 steps            <==>
int_inv_sqrt(dda->n) = 0x1000 / sqrt(640) = 161          <==>
C0 * int_inv_sqrt(dda->n) = 20'000'000 / sqrt(1280 * 50 / 2000) * 161 =
= 20'000'000 / 5.657 * 161 = 569'206'293 = no overflow (30 bits).

I hope I didn't mess up with the pocket calculator.

If the above observation is right, replacing this:

move_c = ((C0 >> 8) * int_inv_sqrt(dda->n)) >> 5;

by this code:

move_c = ((C0 >> 13) * int_inv_sqrt(dda->n));

should give better results. Not yet a fix, because other than the above three examples I didn't investigate precision / overflow, yet.

Traumflug commented 10 years ago

What worries me with the above is, c_move = 500'000'000 >> 5 = 15'625'000 = 0.7 steps/second. Can't be right.

phord commented 10 years ago

For what it's worth, I chose 50steps/mm because the default config for Gen7 v1.3 has 40, but I wanted to simplify the mental math. My real value is 12450 steps/m using half-steps, but I can use 1/32nd-steps to get up to 199200 steps/m.

Your 1280 is 100x the resolution of my printer. That seems insane to me. I imagine it must severely limit your max feedrate. It takes significantly longer to simulate and plot, too. :-)

It does appear to produce more correct values, too; they're not exactly right but they do not exceed the target speed and they are above 90% of the target speed.

Also, the proposed change to move_c calculation didn't help at all. Thanks for trying, though.

phord commented 10 years ago

When I said "I wouldn't say it gets better", I meant it does not appear that lookahead produces improved results, just different ones. I wasn't commenting on the pasted code.

Traumflug commented 10 years ago

Your 1280 is 100x the resolution of my printer. That seems insane to me.

It's a WolfStrap, M8 rods on all axes.

Traumflug commented 10 years ago

One thing coming to mind here is, acceleration is recalculated every 2 milliseconds only, not on every step. If you have acceleration ramps with just 3 or 5 steps, missing one or two of them matters, of course.

drf5n commented 10 years ago

I re-powered my printer and upgraded my printed MXL gears to some nice machined 20 tooth GT2 gears, (with 1/4 step microstepping, they give 20000 steps/m) and I think I might be running into this problem. With ACCELERATION_RAMPING, no lookahead, ACCELERATION 1000., I tried a slow 10mm 600 mm/min (G01 X10 F600 ; 16 seconds) 10mm at 60mm/min (G01 X1 F60; 16 seconds). I also tried a couple 0.1mm moves (2-step) and they took the same amount of time, (to my ear) at a wide range of feedrates.

With lookahead on, the two steps moves do seem sensitive to feedrate, but a 10mm F600 moves took 16s, and a 1mm move at F60 took 8s.

So this potential overflow is at https://github.com/Traumflug/Teacup_Firmware/blob/master/dda.c#L851 with C0 defined at https://github.com/Traumflug/Teacup_Firmware/blob/master/dda_maths.h#L76

What are the limits on feedrates and accelerations for ~20000 steps/m systems?

drf5n commented 10 years ago

Shouldn't the ACCELERATION_RAMPING recalc the speed in the next dda_clock() after the last ramp-up step? As it is, the cruising speed appears to remain the speed of the last speed before the end of the ramp-up per https://github.com/Traumflug/Teacup_Firmware/blob/master/dda.c#L829

If you have a slow acceleration, then there's a good chance that the last speed calculated within 2ms of the top of the ramp is close to the desired speed, but if the acceleration is fast, or the speed is low, the ramp won't have many different speeds.

If I understand this correctly, the recommended default (https://github.com/Traumflug/Teacup_Firmware/blob/master/config.default.h#L155) ACCELERATION 1000 used with the ramp for a F300 mm/min == 5mm/s move could have its speed recalculated maybe floor(5mm/s/(1000mm/s^2)/0.002s) = 2 times up the 5ms ramp, while the 10mm/s^2 ACCELERATION's 500ms ramp would have 250 recalculations. Couldn't the cruising speed in the first case be up to 20% slow?

Traumflug commented 10 years ago

My unscientific measurements give 48'000 steps/second on a 20 MHz Gen7, which means about 38'400 steps/second on a 16 MHz electronics. At 20 steps/mm, this means 2400 mm/s or 1920 mm/s travel speed. That's likely beyond your steppers' max. RPM and also beyond what you can accelerate on a 200x200 mm print bed.

Regarding the problem, I'm pretty sure I found the cause. Acceleration happens only while stepping the acceleration steps. Once these are done, speeds are no longer re-calculated. Then, however, speed will stick to the last calculation during acceleration steps. In dda_clock() we need to detect the "acceleration just done" situation and recalculate speed once more. Or to just set dda->c to dda->c_min during constant speed.

The latter strategy faces yet another problem. The number of acceleration steps is alsways calculated with the STEPS_PER_M of the X axis, so this number is wrong if e.g. the Z axis has a vastly higher number. You'd get a speed bump and likely a stalling stepper.

Oh dear, there are still quite a number of details to solve. And at Linuxwochen in Vienna last week I promised to implement true Bezier curves :-}

Traumflug commented 10 years ago

If I understand this correctly, the recommended default (https://github.com/Traumflug/Teacup_Firmware/blob/master/config.default.h#L155) ACCELERATION 1000 used with the ramp for a F300 mm/min == 5mm/s move could have its speed recalculated maybe floor(5mm/s/(1000mm/s^2)/0.002s) = 2 times up the 5ms ramp, while the 10mm/s^2 ACCELERATION's 500ms ramp would have 250 recalculations. Couldn't the cruising speed in the first case be up to 20% slow?

BTW., yes, this conclusion looks correct.

Traumflug commented 10 years ago

Shouldn't the ACCELERATION_RAMPING recalc the speed in the next dda_clock() after the last ramp-up step?

Yes. And the lack of this is exactly the problem.

I just didn't continue to work in this area, partly because I have very high steps/mm and slow accelerating axes where the effect is barely noticeable. Maybe it's simple to solve, maybe it opens a can of worms elsewhere (e.g. when homing) If you have an idea, give it a try! :-)

drf5n commented 10 years ago

My unscientific measurements give 48'000 steps/second on a 20 MHz Gen7, which means about 38'400 steps/second on a 16 MHz electronics. At 20 steps/mm, this means 2400 mm/s or 1920 mm/s travel speed. That's likely beyond your steppers' max. RPM and also beyond what you can accelerate on a 200x200 mm print bed.

... but there are also potential overflow issues like what you calculated above. I think if you try 20 steps/mm, 16MHZ, 1000mm/s^2, then feeds lower than 2100 mm/min overflow at the end of the ramp: (aren't they even worse at the beginning of the ramp?)

v=35mm/s; s = v^2 / a = 35 * 35 / 1000 = 1.225 mm = 24 steps           <==>
int_inv_sqrt(dda->n) = 0x1000 / sqrt(24) = 836           <==>
C0 * int_inv_sqrt(dda->n) = 16'000'000 / sqrt(20 * 1000 / 2000) * 836 =
= 16'000'000 / 3.1622 * 836 = 4229862598 = no overflow (31.9 bits).

F=2000mm/min:

v=33.33mm/s; s = v^2 / a =  1.11 mm = 22 steps           <==>
int_inv_sqrt(dda->n) = 0x1000 / sqrt(22) = 873          <==>
C0 * int_inv_sqrt(dda->n) = 16'000'000 / sqrt(20 * 1000 / 2000) * 873 =
= 16'000'000 / 3.1622 * 873 = 4417069436 = overflow (32.04 bits).

 # ugly R function for posterity:
 C0bits<-function(spmm,acc,f,fcpu){v=f/60;s=v^2/acc;ns=floor(s/(1/spmm));iis=floor(0x1000/sqrt(ns));counts=fcpu/sqrt(spmm*acc/2000)*iis;list(counts=counts,bits=log(counts,2),s=s,v=v,dist=s,ns=floor(ns),iis=iis,c0=fcpu/sqrt(spmm*10/2000))}

Traumflug commented 10 years ago

Admittedly, there are limits. However, dda->n gives the distance from the movement endpoint, so it's the same during acceleration and deceleration, just mirrored. And steps/mm are given in steps/m, so:

C0 = 16'000'000 / sqrt(20'000 * 1000 / 2000)
   = 16'000'000 / 100
   = 160'000

and

C0 * int_inv_sqrt(dda->n) = 160'000 * 836 = 133760000 (27 bits)

That said, up to today I couldn't find a reason why c/move_c/c_min calculations are always shifted by 8 bits and reverse shifted when calling setTimer(). As far as I can see, there's no use for this, not even higher accuracy. I guess it exists for historical reasons only and can be removed.

drf5n commented 10 years ago

Ah, I had it wrong from your Dec 13 note then. Doesn't it still have problems with the earlier steps up the ramp? The largest int_inv_sqrt would be at step n=1 with 0x1000/(sqrt(1))=4096, so the maximum non-overflowing C0 would be 2^32/4096 ? or maybe 2^24/(2*sqrt(1)) with the >>8 shifting?

Or on the low side, the 24 usable bits of C0 sets a lower bound on the product of steps-per-m and accel, e.g: 280 steps/m @ 10mm/s^2 on a 20MHz machine.

Maybe since the cruising speed is the last calculated step speed up the ramp, then you should set your maximum acceleration to produce ramps of at least 5 _2ms for the minimum expected feedrate to get within 20% of the desired feedrate per v=a_t, reordered to v/t=a. So, for a minimum feedrate of 60mm/min, or 1mm/sec, in 0.010s, the maximum acceleration should be 1/0.010= 100mm/s^2.

Hmph. Still a G01 X10 F60 takes an unexpected 51 seconds on my at90usb1286 with ACCEL=100. STEPS_PERM* = 20000. Would this happen on Gen7?

m111 s6 SENDING:M111 S6 Pos: 0.000,0.000,0.000,0.000,6000 Dst: 0.000,0.000,0.000,0.000,6000 Q6/6E Pos: 0.000,0.000,0.000,0.000,6000 Dst: 0.000,0.000,0.000,0.000,6000 Q6/6E

g01 x10 f60 SENDING:G01 X10 F60 Create: X 0.000 Y 0.000 Z 0.000 F 0 [10000,0,0,0] [ts:200,ds:10000] } Start: X 10.000 Y 0.000 Z 0.000 F 60 Pos: 1.000,0.000,0.000,0.000,60 Dst: 10.000,0.000,0.000,0.000,60 Q7/7 Pos: 2.550,0.000,0.000,0.000,60 Dst: 10.000,0.000,0.000,0.000,60 Q7/7 Pos: 4.150,0.000,0.000,0.000,60 Dst: 10.000,0.000,0.000,0.000,60 Q7/7 Pos: 5.700,0.000,0.000,0.000,60 Dst: 10.000,0.000,0.000,0.000,60 Q7/7 Pos: 7.300,0.000,0.000,0.000,60 Dst: 10.000,0.000,0.000,0.000,60 Q7/7 Pos: 8.900,0.000,0.000,0.000,60 Dst: 10.000,0.000,0.000,0.000,60 Q7/7 Pos: 10.000,0.000,0.000,0.000,60 Dst: 10.000,0.000,0.000,0.000,60 Q7/7E Pos: 10.000,0.000,0.000,0.000,60 Dst: 10.000,0.000,0.000,0.000,60

Traumflug commented 10 years ago

The largest int_inv_sqrt would be at step n=1 with 0x1000/(sqrt(1))=4096, so the maximum non-overflowing C0 would be 2^32/4096 ? or maybe 2^24/(2*sqrt(1)) with the >>8 shifting?

Sounds like a reasonable conclusion. Sounds also like there's a reason to get rid of this bit shifting. The callenge is to find all the places where it's used :-)

Maybe since the cruising speed is the last calculated step speed up the ramp, then you should set your maximum acceleration to produce ramps of at least 5 _2ms for the minimum expected feedrate to get within 20% of the desired feedrate per v=a_t, reordered to v/t=a. So, for a minimum feedrate of 60mm/min, or 1mm/sec, in 0.010s, the maximum acceleration should be 1/0.010= 100mm/s^2.

There should be a better solution. Ramps can be as short as 1 step, not matter which acceleration.

Hmph. Still a G01 X10 F60 takes an unexpected 51 seconds on my at90usb1286 with ACCEL=100. STEPS_PERM* = 20000. Would this happen on Gen7?

It shouldn't matter which CPU or electronics the firmware is running on. In your case, I'd print out all the numbers you assume above to make sure what's happening. Much better than doing conclusions in theory, because in theory you often miss a few bits here and there.

drf5n commented 10 years ago

I modified the config file at https://github.com/Traumflug/Teacup_Firmware/blob/experimental/config.teensypp.h to more closely match my testbed hardware.

With the ACCEL 1000. per line in https://github.com/Traumflug/Teacup_Firmware/blob/experimental/config.teensypp.h#L171 the G01 X10 F60 command took 16 seconds, while with the ACCEL 100. it takes 50 seconds.

I was hoping that you could confirm it whether or not it is something I've screwed up in my relatively unexplored at90usb1286 contributions, or if slow speeds is common to all the CPUs and electronics with smaller steps/m, low feed rates, and low accelerations.

drf5n commented 10 years ago

There should be a better solution. Ramps can be as short as 1 step, not matter which acceleration.

If ramps are as short as one step, then wouldn't, with the current code, the cruising speed be limited to the speed of the first step? C0/F_CPU ?

Traumflug commented 10 years ago

I was hoping that you could confirm it whether or not it is something I've screwed up in my relatively unexplored at90usb1286 contributions, or if slow speeds is common to all the CPUs and electronics with smaller steps/m, low feed rates, and low accelerations.

As these changes touch serial communications only I see no reason why your electronics should behave different from mine. But I hope I can find some time to check here, too.

If ramps are as short as one step, then wouldn't, with the current code, the cruising speed be limited to the speed of the first step? C0/F_CPU ?

Yes. That's not a speed limitation, though, just an acceleration limitation. Faster speeds would use more than one acceleration step.

drf5n commented 10 years ago

Yes. That's not a speed limitation, though, just an acceleration limitation. Faster speeds would use more than one acceleration step.

I used the wrong word: s/be limited to/remain/.

I would have thought the ACCEL 100 would have had time to calculate 5 acceleration steps and be at full speed in 1/10 of a second, with the full G1 x10 F60 move taking ~10.2s rather than the delivered G1 X10 F12, or the ACCEL 1000 result of G1 X10 F37.

Traumflug commented 10 years ago

In this case you have more acceleration steps. Having just one acceleration step is an edge case which should nevertheless work.

Traumflug commented 10 years ago

Another branch opened for tackling this issue: https://github.com/Traumflug/Teacup_Firmware/commits/issue69-2

My pretty firm guess is, we need a different C0 for each axis (array), then the calculated number of acceleration steps should match sufficiently in all cases to allow switching to maximum speed after doing this number of acceleration steps. This should also solve these accelerations with only a few steps: the last acceleration step finally gets applied.

For now we can't, because on axes with a higher step/mm than the X axis (e.g. Z axis), we do way to few acceleration steps, so we had a substantial speed jump when switching to maximum speed. One symptom of this shortcoming is, the Z axis is generally too slow.

Traumflug commented 10 years ago

9 new commits on the issue69-2 branch. C0 replaced with an array, (hopefully) all assumptions regarding STEPS_PER_M_X removed. This should bring back expected speed on axes with more steps/mm than on the X axis.

Traumflug commented 10 years ago

Part of this tested and moved to the experimental branch. Also moved a number of commits from experimental to master, no conflicts on rebasing expected. The new, much higher (now correct) speed of the Z axis when running experimental can be confusing, so check your Z feedrates.

Traumflug commented 10 years ago

Ah, this bitshifting of dda->c poked into my eye again. This time I couldn't resist to get rid of it. Result: 1.2% better performance, 52 bytes smaller code. On the experimental branch, of course.

I hope I can return to this acceleration flaw without becoming distracted again. :-)

Wurstnase commented 8 years ago

When dda->n is > 65535, int_inv_sqrt() will overflow. This will only occurs at very high step rates with relative slow accelerations. AVRs are too slow for this.

Maybe the LPC can reach this? The STM32 with TMC2130 at 1/256 µ-steps will reach this easily.

a = v^2 / (2*s)

with s = _S_ * sm where _S_ = steps and sm = steps/mm

sm = a * 2 * _S_ / v^2

v_max = max_frequency / sm

sm = max_frequency^2 / (a * 2 * _S_)

overflow when _S_ > 65535

=> sm * a = max_frequency^2 / (2 * 65535)

Traumflug commented 8 years ago

Another reason to switch to calculating speed first, then the required step delay from that, isn't it? This way there is no square and no root involved, overflows can't happen.

That said, the original issue here is an entirely different one. Teacup misses one step delay calculation after finishing the acceleration ramp, so actual top speed can be up to 33% smaller than the wanted target speed if there are only 2 acceleration steps.

Wurstnase commented 8 years ago

Teacup misses one step delay calculation after finishing the acceleration ramp, so actual top speed can be up to 33% smaller than the wanted target speed if there are only 2 acceleration steps.

Ok. So this issue is finally solved?!?

Another reason to switch to calculating speed first

Yes. I have currently too much projects. New printer, lots of new code... More soon(tm)

Traumflug commented 8 years ago

So this issue is finally solved?!?

Not to my knowledge.

I have currently too much projects.

Me, too. :-)

Wurstnase commented 8 years ago

So this issue is finally solved?!?

Not to my knowledge.

I not really sure about this issue. The speed will vary in the end and doesn't reach the expected speed?

On my tests the speed also vary. There it comes from the dda_clock. It accelerate for a specific distance. rampup_steps. But dda_clock changes only every 2ms. So when you accelerate and you are beyond rampup_steps the current code won't change the speed anymore.

Traumflug commented 8 years ago

It's here: https://github.com/Traumflug/Teacup_Firmware/blob/master/dda.c#L871

Speed (recalc_speed) is (re-)calculated and applied only during acceleration and deceleration ramps. During the constant speed part it isn't, so speed sticks at the last acceleration step. Barely noticeable if there are 1000 or more such acceleration steps; deviation is less than 0.1%, then. Very noticeable if there are only 3 acceleration steps; deviation up to 33%.

Wurstnase commented 8 years ago

Yes, that's exactly what I've mean. In Marlin they calculate in advance the 'travelspeed' and apply it.

So we have 3 states in this case.

recalc when accelerating
recalc when decelerating
apply top speed

Wurstnase commented 8 years ago

This should fix it: https://github.com/Traumflug/Teacup_Firmware/tree/issue-69a

phord commented 8 years ago

Nice! This fix at issue-69a does appear to fix this issue in the simulator. I haven't tried it on my printer, though. Notice the top of all the speeds in this graph is the same.

screenshot from 2016-11-14 14-01-41

Wurstnase commented 8 years ago

Yes, I run my last dry print also with this code. Also you can hear it. Just test it with running the motor at the same range with the same speed. The sound at top speed is not different anymore.

Wurstnase commented 8 years ago

Maybe the comparison of dda->c > dda->c_min needs to be atomic. I'm not sure.

Traumflug commented 8 years ago

Code needs to be atomic if the variables used can be changed by an interrupt. Related interrupt code is in dda_step() and dda_start().

Wurstnase commented 8 years ago

Ok, when a new dda comes alive, the dda->c_min could change and we will get a jump in speed. So we should also check the ID like the lines before.

Wurstnase commented 8 years ago

Last two commits of https://github.com/Traumflug/Teacup_Firmware/commits/issue-69a could be squashed.

phord commented 7 years ago

Just a note for future-me:

I read this code several times without understanding why it's not backwards.

   if (dda->c > dda->c_min) {
     dda->c = dda->c_min;
   }

This happens only when !recalc_speed, meaning we are cruising, not accelerating or decelerating. So it pegs our dda->c at c_min if it never made it as far as c_min. Maybe the comments can be clearer or the conditions made more obvious.

Wurstnase commented 7 years ago

I made an update of that old issue-69 branch. So we could use this also later for testing. https://github.com/Traumflug/Teacup_Firmware/tree/issue-69c

Traumflug / Teacup_Firmware

Acceleration > 100 causes sporadic speeds #69