Closed phord closed 7 years ago
The problem bisects to 33a936cf0779e5fefad676ada57edcd6dfa47bbb.
In origin/master, the same graph above looks good. The target speed is not always reached, but it is much closer.
This seems related to #67, but I cannot be certain because that issue is not reproducible on its own, yet.
The problem bisects to 33a936c.
This link doesn't work. A local commit of yours?
All this simulation stuff opens new horizons on the workings of the firmware. Amazing!
I see you've put the issue-69 branch on top of issue-68, so you have "precision" rampup_steps calculation already. For all manual verifications, I found rampup_steps to be on the spot. So far I tried only 50 mm/s2.
Oh yeah, duh. That's the commit I cherry-picked onto the simulator code for testing.
Try this one. 93b58d816115ece5d8c4cc2784a3de521c4b25ed
Hmm. @zuzuf talked about inaccuracies with this simplification (dda.c, about line 860) for low n values in issue #43:
// Explicit formula: c0 * (sqrt(n + 1) - sqrt(n)),
// approximation here: c0 * (1 / (2 * sqrt(n))).
move_c = ((C0 >> 8) * int_inv_sqrt(dda->n)) >> 5;
Higher accelerations have a lower number of acceleration steps for the same speed, so low n values are present. Doesn't explain why it gets better with lookahead-joined moves, though.
I wouldn't say it gets better.
Uhm, I just pasted some of the existing code to give you a pointer. No surprise it works the same way :-)
I see you run at 50 steps/mm. Then accelerating at 410 mm/s2 to 300 mm/min = 5 mm/s is
s = v^2 / a = 5 * 5 / 410 = 0.061 mm = 3 steps <==>
int_inv_sqrt(dda->n) = 0x1000 / sqrt(3) = 2364 <==>
C0 * int_inv_sqrt(dda->n) = 20'000'000 / sqrt(50 * 410 / 2000) * 2364 =
= 20'000'000 / 3.2016 * 2364 = 14'767'616'191 = overflow (34 bits).
Accelerating at 10 mm/s2:
s = v^2 / a = 5 * 5 / 10 = 2.5 mm = 125 steps <==>
int_inv_sqrt(dda->n) = 0x1000 / sqrt(125) = 366 <==>
C0 * int_inv_sqrt(dda->n) = 20'000'000 / sqrt(50 * 10 / 2000) * 366 =
= 20'000'000 / 0.5 * 366 = 14'640'000'000 = again, overflow (34 bits).
Compare this to what I run (1280 steps/mm, 50 mm/s^2):
s = v^2 / a = 5 * 5 / 50 = 0.5 mm = 640 steps <==>
int_inv_sqrt(dda->n) = 0x1000 / sqrt(640) = 161 <==>
C0 * int_inv_sqrt(dda->n) = 20'000'000 / sqrt(1280 * 50 / 2000) * 161 =
= 20'000'000 / 5.657 * 161 = 569'206'293 = no overflow (30 bits).
I hope I didn't mess up with the pocket calculator.
If the above observation is right, replacing this:
move_c = ((C0 >> 8) * int_inv_sqrt(dda->n)) >> 5;
by this code:
move_c = ((C0 >> 13) * int_inv_sqrt(dda->n));
should give better results. Not yet a fix, because other than the above three examples I didn't investigate precision / overflow, yet.
What worries me with the above is, c_move = 500'000'000 >> 5 = 15'625'000 = 0.7 steps/second. Can't be right.
For what it's worth, I chose 50steps/mm because the default config for Gen7 v1.3 has 40, but I wanted to simplify the mental math. My real value is 12450 steps/m using half-steps, but I can use 1/32nd-steps to get up to 199200 steps/m.
Your 1280 is 100x the resolution of my printer. That seems insane to me. I imagine it must severely limit your max feedrate. It takes significantly longer to simulate and plot, too. :-)
It does appear to produce more correct values, too; they're not exactly right but they do not exceed the target speed and they are above 90% of the target speed.
Also, the proposed change to move_c calculation didn't help at all. Thanks for trying, though.
When I said "I wouldn't say it gets better", I meant it does not appear that lookahead produces improved results, just different ones. I wasn't commenting on the pasted code.
Your 1280 is 100x the resolution of my printer. That seems insane to me.
It's a WolfStrap, M8 rods on all axes.
One thing coming to mind here is, acceleration is recalculated every 2 milliseconds only, not on every step. If you have acceleration ramps with just 3 or 5 steps, missing one or two of them matters, of course.
I re-powered my printer and upgraded my printed MXL gears to some nice machined 20 tooth GT2 gears, (with 1/4 step microstepping, they give 20000 steps/m) and I think I might be running into this problem. With ACCELERATION_RAMPING, no lookahead, ACCELERATION 1000., I tried a slow 10mm 600 mm/min (G01 X10 F600 ; 16 seconds) 10mm at 60mm/min (G01 X1 F60; 16 seconds). I also tried a couple 0.1mm moves (2-step) and they took the same amount of time, (to my ear) at a wide range of feedrates.
With lookahead on, the two steps moves do seem sensitive to feedrate, but a 10mm F600 moves took 16s, and a 1mm move at F60 took 8s.
So this potential overflow is at https://github.com/Traumflug/Teacup_Firmware/blob/master/dda.c#L851 with C0 defined at https://github.com/Traumflug/Teacup_Firmware/blob/master/dda_maths.h#L76
What are the limits on feedrates and accelerations for ~20000 steps/m systems?
Shouldn't the ACCELERATION_RAMPING recalc the speed in the next dda_clock() after the last ramp-up step? As it is, the cruising speed appears to remain the speed of the last speed before the end of the ramp-up per https://github.com/Traumflug/Teacup_Firmware/blob/master/dda.c#L829
If you have a slow acceleration, then there's a good chance that the last speed calculated within 2ms of the top of the ramp is close to the desired speed, but if the acceleration is fast, or the speed is low, the ramp won't have many different speeds.
If I understand this correctly, the recommended default (https://github.com/Traumflug/Teacup_Firmware/blob/master/config.default.h#L155) ACCELERATION 1000 used with the ramp for a F300 mm/min == 5mm/s move could have its speed recalculated maybe floor(5mm/s/(1000mm/s^2)/0.002s) = 2 times up the 5ms ramp, while the 10mm/s^2 ACCELERATION's 500ms ramp would have 250 recalculations. Couldn't the cruising speed in the first case be up to 20% slow?
My unscientific measurements give 48'000 steps/second on a 20 MHz Gen7, which means about 38'400 steps/second on a 16 MHz electronics. At 20 steps/mm, this means 2400 mm/s or 1920 mm/s travel speed. That's likely beyond your steppers' max. RPM and also beyond what you can accelerate on a 200x200 mm print bed.
Regarding the problem, I'm pretty sure I found the cause. Acceleration happens only while stepping the acceleration steps. Once these are done, speeds are no longer re-calculated. Then, however, speed will stick to the last calculation during acceleration steps. In dda_clock() we need to detect the "acceleration just done" situation and recalculate speed once more. Or to just set dda->c to dda->c_min during constant speed.
The latter strategy faces yet another problem. The number of acceleration steps is alsways calculated with the STEPS_PER_M of the X axis, so this number is wrong if e.g. the Z axis has a vastly higher number. You'd get a speed bump and likely a stalling stepper.
Oh dear, there are still quite a number of details to solve. And at Linuxwochen in Vienna last week I promised to implement true Bezier curves :-}
If I understand this correctly, the recommended default (https://github.com/Traumflug/Teacup_Firmware/blob/master/config.default.h#L155) ACCELERATION 1000 used with the ramp for a F300 mm/min == 5mm/s move could have its speed recalculated maybe floor(5mm/s/(1000mm/s^2)/0.002s) = 2 times up the 5ms ramp, while the 10mm/s^2 ACCELERATION's 500ms ramp would have 250 recalculations. Couldn't the cruising speed in the first case be up to 20% slow?
BTW., yes, this conclusion looks correct.
Shouldn't the ACCELERATION_RAMPING recalc the speed in the next dda_clock() after the last ramp-up step?
Yes. And the lack of this is exactly the problem.
I just didn't continue to work in this area, partly because I have very high steps/mm and slow accelerating axes where the effect is barely noticeable. Maybe it's simple to solve, maybe it opens a can of worms elsewhere (e.g. when homing) If you have an idea, give it a try! :-)
My unscientific measurements give 48'000 steps/second on a 20 MHz Gen7, which means about 38'400 steps/second on a 16 MHz electronics. At 20 steps/mm, this means 2400 mm/s or 1920 mm/s travel speed. That's likely beyond your steppers' max. RPM and also beyond what you can accelerate on a 200x200 mm print bed.
... but there are also potential overflow issues like what you calculated above. I think if you try 20 steps/mm, 16MHZ, 1000mm/s^2, then feeds lower than 2100 mm/min overflow at the end of the ramp: (aren't they even worse at the beginning of the ramp?)
v=35mm/s; s = v^2 / a = 35 * 35 / 1000 = 1.225 mm = 24 steps <==>
int_inv_sqrt(dda->n) = 0x1000 / sqrt(24) = 836 <==>
C0 * int_inv_sqrt(dda->n) = 16'000'000 / sqrt(20 * 1000 / 2000) * 836 =
= 16'000'000 / 3.1622 * 836 = 4229862598 = no overflow (31.9 bits).
F=2000mm/min:
v=33.33mm/s; s = v^2 / a = 1.11 mm = 22 steps <==>
int_inv_sqrt(dda->n) = 0x1000 / sqrt(22) = 873 <==>
C0 * int_inv_sqrt(dda->n) = 16'000'000 / sqrt(20 * 1000 / 2000) * 873 =
= 16'000'000 / 3.1622 * 873 = 4417069436 = overflow (32.04 bits).
# ugly R function for posterity:
C0bits<-function(spmm,acc,f,fcpu){v=f/60;s=v^2/acc;ns=floor(s/(1/spmm));iis=floor(0x1000/sqrt(ns));counts=fcpu/sqrt(spmm*acc/2000)*iis;list(counts=counts,bits=log(counts,2),s=s,v=v,dist=s,ns=floor(ns),iis=iis,c0=fcpu/sqrt(spmm*10/2000))}
Admittedly, there are limits. However, dda->n gives the distance from the movement endpoint, so it's the same during acceleration and deceleration, just mirrored. And steps/mm are given in steps/m, so:
C0 = 16'000'000 / sqrt(20'000 * 1000 / 2000)
= 16'000'000 / 100
= 160'000
and
C0 * int_inv_sqrt(dda->n) = 160'000 * 836 = 133760000 (27 bits)
That said, up to today I couldn't find a reason why c/move_c/c_min calculations are always shifted by 8 bits and reverse shifted when calling setTimer(). As far as I can see, there's no use for this, not even higher accuracy. I guess it exists for historical reasons only and can be removed.
Ah, I had it wrong from your Dec 13 note then. Doesn't it still have problems with the earlier steps up the ramp? The largest int_inv_sqrt would be at step n=1 with 0x1000/(sqrt(1))=4096, so the maximum non-overflowing C0 would be 2^32/4096 ? or maybe 2^24/(2*sqrt(1)) with the >>8 shifting?
Or on the low side, the 24 usable bits of C0 sets a lower bound on the product of steps-per-m and accel, e.g: 280 steps/m @ 10mm/s^2 on a 20MHz machine.
Maybe since the cruising speed is the last calculated step speed up the ramp, then you should set your maximum acceleration to produce ramps of at least 5 _2ms for the minimum expected feedrate to get within 20% of the desired feedrate per v=a_t, reordered to v/t=a. So, for a minimum feedrate of 60mm/min, or 1mm/sec, in 0.010s, the maximum acceleration should be 1/0.010= 100mm/s^2.
Hmph. Still a G01 X10 F60 takes an unexpected 51 seconds on my at90usb1286 with ACCEL=100. STEPS_PERM* = 20000. Would this happen on Gen7?
m111 s6 SENDING:M111 S6 Pos: 0.000,0.000,0.000,0.000,6000 Dst: 0.000,0.000,0.000,0.000,6000 Q6/6E Pos: 0.000,0.000,0.000,0.000,6000 Dst: 0.000,0.000,0.000,0.000,6000 Q6/6E
g01 x10 f60 SENDING:G01 X10 F60 Create: X 0.000 Y 0.000 Z 0.000 F 0 [10000,0,0,0] [ts:200,ds:10000] } Start: X 10.000 Y 0.000 Z 0.000 F 60 Pos: 1.000,0.000,0.000,0.000,60 Dst: 10.000,0.000,0.000,0.000,60 Q7/7 Pos: 2.550,0.000,0.000,0.000,60 Dst: 10.000,0.000,0.000,0.000,60 Q7/7 Pos: 4.150,0.000,0.000,0.000,60 Dst: 10.000,0.000,0.000,0.000,60 Q7/7 Pos: 5.700,0.000,0.000,0.000,60 Dst: 10.000,0.000,0.000,0.000,60 Q7/7 Pos: 7.300,0.000,0.000,0.000,60 Dst: 10.000,0.000,0.000,0.000,60 Q7/7 Pos: 8.900,0.000,0.000,0.000,60 Dst: 10.000,0.000,0.000,0.000,60 Q7/7 Pos: 10.000,0.000,0.000,0.000,60 Dst: 10.000,0.000,0.000,0.000,60 Q7/7E Pos: 10.000,0.000,0.000,0.000,60 Dst: 10.000,0.000,0.000,0.000,60
The largest int_inv_sqrt would be at step n=1 with 0x1000/(sqrt(1))=4096, so the maximum non-overflowing C0 would be 2^32/4096 ? or maybe 2^24/(2*sqrt(1)) with the >>8 shifting?
Sounds like a reasonable conclusion. Sounds also like there's a reason to get rid of this bit shifting. The callenge is to find all the places where it's used :-)
Maybe since the cruising speed is the last calculated step speed up the ramp, then you should set your maximum acceleration to produce ramps of at least 5 _2ms for the minimum expected feedrate to get within 20% of the desired feedrate per v=a_t, reordered to v/t=a. So, for a minimum feedrate of 60mm/min, or 1mm/sec, in 0.010s, the maximum acceleration should be 1/0.010= 100mm/s^2.
There should be a better solution. Ramps can be as short as 1 step, not matter which acceleration.
Hmph. Still a G01 X10 F60 takes an unexpected 51 seconds on my at90usb1286 with ACCEL=100. STEPS_PERM* = 20000. Would this happen on Gen7?
It shouldn't matter which CPU or electronics the firmware is running on. In your case, I'd print out all the numbers you assume above to make sure what's happening. Much better than doing conclusions in theory, because in theory you often miss a few bits here and there.
I modified the config file at https://github.com/Traumflug/Teacup_Firmware/blob/experimental/config.teensypp.h to more closely match my testbed hardware.
With the ACCEL 1000. per line in https://github.com/Traumflug/Teacup_Firmware/blob/experimental/config.teensypp.h#L171 the G01 X10 F60 command took 16 seconds, while with the ACCEL 100. it takes 50 seconds.
I was hoping that you could confirm it whether or not it is something I've screwed up in my relatively unexplored at90usb1286 contributions, or if slow speeds is common to all the CPUs and electronics with smaller steps/m, low feed rates, and low accelerations.
There should be a better solution. Ramps can be as short as 1 step, not matter which acceleration.
If ramps are as short as one step, then wouldn't, with the current code, the cruising speed be limited to the speed of the first step? C0/F_CPU ?
I was hoping that you could confirm it whether or not it is something I've screwed up in my relatively unexplored at90usb1286 contributions, or if slow speeds is common to all the CPUs and electronics with smaller steps/m, low feed rates, and low accelerations.
As these changes touch serial communications only I see no reason why your electronics should behave different from mine. But I hope I can find some time to check here, too.
If ramps are as short as one step, then wouldn't, with the current code, the cruising speed be limited to the speed of the first step? C0/F_CPU ?
Yes. That's not a speed limitation, though, just an acceleration limitation. Faster speeds would use more than one acceleration step.
Yes. That's not a speed limitation, though, just an acceleration limitation. Faster speeds would use more than one acceleration step.
I used the wrong word: s/be limited to/remain/.
I would have thought the ACCEL 100 would have had time to calculate 5 acceleration steps and be at full speed in 1/10 of a second, with the full G1 x10 F60 move taking ~10.2s rather than the delivered G1 X10 F12, or the ACCEL 1000 result of G1 X10 F37.
In this case you have more acceleration steps. Having just one acceleration step is an edge case which should nevertheless work.
Another branch opened for tackling this issue: https://github.com/Traumflug/Teacup_Firmware/commits/issue69-2
My pretty firm guess is, we need a different C0 for each axis (array), then the calculated number of acceleration steps should match sufficiently in all cases to allow switching to maximum speed after doing this number of acceleration steps. This should also solve these accelerations with only a few steps: the last acceleration step finally gets applied.
For now we can't, because on axes with a higher step/mm than the X axis (e.g. Z axis), we do way to few acceleration steps, so we had a substantial speed jump when switching to maximum speed. One symptom of this shortcoming is, the Z axis is generally too slow.
9 new commits on the issue69-2 branch. C0 replaced with an array, (hopefully) all assumptions regarding STEPS_PER_M_X removed. This should bring back expected speed on axes with more steps/mm than on the X axis.
Part of this tested and moved to the experimental branch. Also moved a number of commits from experimental to master, no conflicts on rebasing expected. The new, much higher (now correct) speed of the Z axis when running experimental can be confusing, so check your Z feedrates.
Ah, this bitshifting of dda->c poked into my eye again. This time I couldn't resist to get rid of it. Result: 1.2% better performance, 52 bytes smaller code. On the experimental branch, of course.
I hope I can return to this acceleration flaw without becoming distracted again. :-)
When dda->n is > 65535, int_inv_sqrt() will overflow. This will only occurs at very high step rates with relative slow accelerations. AVRs are too slow for this.
Maybe the LPC can reach this? The STM32 with TMC2130 at 1/256 µ-steps will reach this easily.
a = v^2 / (2*s)
with s = _S_ * sm where _S_ = steps and sm = steps/mm
sm = a * 2 * _S_ / v^2
v_max = max_frequency / sm
sm = max_frequency^2 / (a * 2 * _S_)
overflow when _S_ > 65535
=> sm * a = max_frequency^2 / (2 * 65535)
Another reason to switch to calculating speed first, then the required step delay from that, isn't it? This way there is no square and no root involved, overflows can't happen.
That said, the original issue here is an entirely different one. Teacup misses one step delay calculation after finishing the acceleration ramp, so actual top speed can be up to 33% smaller than the wanted target speed if there are only 2 acceleration steps.
Teacup misses one step delay calculation after finishing the acceleration ramp, so actual top speed can be up to 33% smaller than the wanted target speed if there are only 2 acceleration steps.
Ok. So this issue is finally solved?!?
Another reason to switch to calculating speed first
Yes. I have currently too much projects. New printer, lots of new code... More soon(tm)
So this issue is finally solved?!?
Not to my knowledge.
I have currently too much projects.
Me, too. :-)
So this issue is finally solved?!?
Not to my knowledge.
I not really sure about this issue. The speed will vary in the end and doesn't reach the expected speed?
On my tests the speed also vary. There it comes from the dda_clock. It accelerate for a specific distance. rampup_steps. But dda_clock changes only every 2ms. So when you accelerate and you are beyond rampup_steps the current code won't change the speed anymore.
It's here: https://github.com/Traumflug/Teacup_Firmware/blob/master/dda.c#L871
Speed (recalc_speed
) is (re-)calculated and applied only during acceleration and deceleration ramps. During the constant speed part it isn't, so speed sticks at the last acceleration step. Barely noticeable if there are 1000 or more such acceleration steps; deviation is less than 0.1%, then. Very noticeable if there are only 3 acceleration steps; deviation up to 33%.
Yes, that's exactly what I've mean. In Marlin they calculate in advance the 'travelspeed' and apply it.
So we have 3 states in this case.
This should fix it: https://github.com/Traumflug/Teacup_Firmware/tree/issue-69a
Nice! This fix at issue-69a does appear to fix this issue in the simulator. I haven't tried it on my printer, though. Notice the top of all the speeds in this graph is the same.
Yes, I run my last dry print also with this code. Also you can hear it. Just test it with running the motor at the same range with the same speed. The sound at top speed is not different anymore.
Maybe the comparison of dda->c > dda->c_min
needs to be atomic. I'm not sure.
Code needs to be atomic if the variables used can be changed by an interrupt. Related interrupt code is in dda_step() and dda_start().
Ok, when a new dda comes alive, the dda->c_min could change and we will get a jump in speed. So we should also check the ID like the lines before.
Last two commits of https://github.com/Traumflug/Teacup_Firmware/commits/issue-69a could be squashed.
Just a note for future-me:
I read this code several times without understanding why it's not backwards.
if (dda->c > dda->c_min) {
dda->c = dda->c_min;
}
This happens only when !recalc_speed, meaning we are cruising, not accelerating or decelerating. So it pegs our dda->c at c_min if it never made it as far as c_min. Maybe the comments can be clearer or the conditions made more obvious.
I made an update of that old issue-69 branch. So we could use this also later for testing. https://github.com/Traumflug/Teacup_Firmware/tree/issue-69c
When ACCELERATION is 10, we seem able to accelerate up to our target speed without fail. When ACCELERATION is 100 or more, our maximum speed is much lower than the target speed and is spiky.
Here's a graph of our measured velocity for this script at several different ACCELERATION values:
When lookahead is turned on, it seems to work ok for moves after the 2nd one. But the target speed is still unreached and the moves are still spiky.