Closed zuzuf closed 10 years ago
Thanks for reporting this.
Not recalculating move_state.c/.n before each move depends on each movement ending at zero speed. It's obvious, this is no longer true when doing look-ahead. I thought this was already solved earlier.
Can you make a patch, please? Doesn't have to be tidy, it would help me to figure the fix more quickly. You can also send it to mah@jump-ing.de .
That said, the new ACCELERATION_CLOCK makes acceleration calculation easier (closer to standard physics formulas) and also allows almost three times higher step rates. Adapting this for look-ahead shouldn't be difficult. I meant to do this adoption quite a while already, but currently the RepRap wiki has preference.
Please find the patch attached.
It contains a bit more than just the move_state.c/n fix, I also changed the int_sqrt implementation (my binary search implementation is ~8x faster on my Atmega644) and modified the LOOKAHEAD code to compute ramps using only dda steps instead of speeds (it's more precise and requires computing less int_sqrt).
2013/9/8 Traumflug notifications@github.com
Thanks for reporting this.
Not recalculating move_state.c/.n before each move depends on each movement ending at zero speed. It's obvious, this is no longer true when doing look-ahead. I thought this was already solved earlier.
Can you make a patch, please? Doesn't have to be tidy, it would help me to figure the fix more quickly. You can also send it to mah@jump-ing.de .
That said, the new ACCELERATION_CLOCK makes acceleration calculation easier (closer to standard physics formulas) and also allows almost three times higher step rates. Adapting this for look-ahead shouldn't be difficult. I meant to do this adoption quite a while already, but currently the RepRap wiki has preference.
— Reply to this email directly or view it on GitHubhttps://github.com/Traumflug/Teacup_Firmware/issues/43#issuecomment-24021481 .
Roland Brochard (alias Zuzuf)
Am 08.09.2013 16:20, schrieb zuzuf:
Please find the patch attached.
Unfortunately, Github doesn't allow to attach files. Please email me directly: mah@jump-ing.de
It contains a bit more than just the move_state.c/n fix, I also changed the int_sqrt implementation (my binary search implementation is ~8x faster on my Atmega644) and modified the LOOKAHEAD code to compute ramps using only dda steps instead of speeds (it's more precise and requires computing less int_sqrt).
This sound like WOW! :-)
On 09/08/2013 04:39 PM, Traumflug wrote:
Am 08.09.2013 16:20, schrieb zuzuf:
Please find the patch attached.
Unfortunately, Github doesn't allow to attach files. Please email me directly: mah@jump-ing.de
It contains a bit more than just the move_state.c/n fix, I also changed the int_sqrt implementation (my binary search implementation is ~8x faster on my Atmega644) and modified the LOOKAHEAD code to compute ramps using only dda steps instead of speeds (it's more precise and requires computing less int_sqrt).
This sound like WOW! :-)
— Reply to this email directly or view it on GitHub https://github.com/Traumflug/Teacup_Firmware/issues/43#issuecomment-24021906.
The glitch you found seems to happen when a number of rapid moves are computed after eachother. I was still going to look into this but of course when I set out to replicate it, all my prints came out fine...
Anyhow, I have attempted to prevent accumulating errors in the past to prevent having to recalculate that value again but perhaps it can not be avoided.
Thanks for submitting a fix!
I think you are right about rapid moves triggering this bug. I tried to print an elevation map of mars (so lots of rapid small moves) without the fix with the regular ACCELEARION_RAMPING code and got lots of position losses but with the fix it came out fine (and faster :D). So this also happen without LOOKAHEAD.
I have been looking for a way of not recalculating everything for each move (this 32bits division is quite costly so doing it rampups times is likely to timeout often) but since information is lost at each step there is no way to achieve this with an iterative scheme. However I noticed the full equation (with the 2 square roots) evaluates in only 70% more time than the 32bits division (with my int_sqrt code) on both my Atmega644 and an Arduino Leonardo. It may be affordable and it's also more precise since it never accumulates errors. Unfortunately it doesn't produce exactly the same values, there is something like an offset (but I remember there is something about it in the article which describe the method) so it cannot be used to initialize move_state.c with current stepping code.
Maybe there is a way to compute it faster, that would eliminate the need of precomputing move_state.c .
2013/9/8 Cyberwizzard notifications@github.com
On 09/08/2013 04:39 PM, Traumflug wrote:
Am 08.09.2013 16:20, schrieb zuzuf:
Please find the patch attached.
Unfortunately, Github doesn't allow to attach files. Please email me directly: mah@jump-ing.de
It contains a bit more than just the move_state.c/n fix, I also changed the int_sqrt implementation (my binary search implementation is ~8x faster on my Atmega644) and modified the LOOKAHEAD code to compute ramps using only dda steps instead of speeds (it's more precise and requires computing less int_sqrt).
This sound like WOW! :-)
— Reply to this email directly or view it on GitHub < https://github.com/Traumflug/Teacup_Firmware/issues/43#issuecomment-24021906 .
The glitch you found seems to happen when a number of rapid moves are computed after eachother. I was still going to look into this but of course when I set out to replicate it, all my prints came out fine...
Anyhow, I have attempted to prevent accumulating errors in the past to prevent having to recalculate that value again but perhaps it can not be avoided.
Thanks for submitting a fix!
— Reply to this email directly or view it on GitHubhttps://github.com/Traumflug/Teacup_Firmware/issues/43#issuecomment-24023627 .
Roland Brochard (alias Zuzuf)
Unfortunately it doesn't produce exactly the same values, there is something like an offset (but I remember there is something about it in the article which describe the method) so it cannot be used to initialize move_state.c with current stepping code.
Speed calculations don't have to be accurate to the last digit. If you miss the truth by 10% it's unlikely somebody notices it. Actually, there is some error already due to the not so accurate distance calculation in approx_distance().
Indeed missing the truth by a small margin is not a problem but cumulating errors is and I think I found an approximation that can replace the iterative scheme.
What we want here is: Cn = C0 * (sqrt(n+1) - sqrt(n)) but sqrt(n+1)-sqrt(n) behaves like the derivative of sqrt when n grows toward infinity. It may not be accurate enough for small n (1 or 2 that can be dealt with separately) and for n > 2 we only need to compute 1/sqrt(n) which can be done with the same kind of algorithm I used for sqrt(n) and since we don't as much precision it is likely to perform better that the 32bits division. I am going to give it a try.
Is the fix mentioned above committed to any branch yet? I would love to try it out, since I am experiencing the same problem, although to me it seems it occurs when there is a "sharp turn" in the movement, which happens to me many times, as I use the firmware on my CNC machine. If it's any help, I can share the gcode that I experience this issue with.
Yes this fix has been commited to the "roland" branch. I have been testing it, so far so good :).
NB: the fix implements the equation 8 described in http://www.embedded.com/design/mcus-processors-and-socs/4006438/Generate-stepper-motor-speed-profiles-in-real-time whereas initial implementation uses the incremental algorithm (described in the same article) but without the compensation term for the first steps. So you may have to adjust a bit acceleration.
Thank you, I just downloaded the branch, compiled it, and programmed my controller.
However, it doesn't seem to do any acceleration at all. It seems to just jerk right to the feed rate, when trying G1 commands (with F, of course). Apart from the obvious pin configuration (in config.h), these are the only things I configured:
Am I doing something wrong?
These two #defines are correct. Does the same work better with the master branch?
Yes, the master branch works.
I can reproduce it. I think it occurs with the acceleration and speeds I use but it wasn't obvious whereas with acceleration = 10 there is clearly something wrong.
I found what was wrong: an overflow in int_inv_sqrt which wasn't happening on x86 (because of an implicit cast) when I tested it.
I commited the fix.
Thanks, @zuzuf.
Does this sound like you're running Teacup on your PC? Code to do this would be welcome, we already had a "sim" Makefile target a year or two back (but it got forgotten).
I still have the problem look-ahead doesn't work in my configuration, but this was the same with earlier versions. When moving around X-Y corners there is no slowdown at all, instead it tries to move around at full speed. This results in step losses (no surprise).
I think these are the relevant numbers in config.h, it's a printer with threaded rods on all three axes:
#define STEPS_PER_M_X 1280000
#define STEPS_PER_M_Y 1280000
#define LOOKAHEAD_MAX_JERK_XY 1
I also tried with higher jerks, no change.
I have only tested the int_inv_sqrt function on PC not the whole code.
Regarding look-ahead what acceleration and speed values do you use ?
#define ACCELERATION 50.
#define MAXIMUM_FEEDRATE_X 600
#define MAXIMUM_FEEDRATE_Y 400
P.S.: the E axis is defined, but never used.
I tried the latest roland branch loaded to my board, with ACCLERATION 50, and LOOKAHEAD. It slows down significantly at curves (many short straits), and I can see ramping up and down repeatedly in those curves, as if LOOKAHEAD did not have any effect. Needless to say, I am not changing the speed in my gcode, just set it at the beginning (F400).
Is this different from the current Gen7 or master branch?
It's been a while I downloaded the Gen7 branch. Anyway, I had that slowing down extremely problem with Gen7.
Then, I tried the master branch, and so far it worked the best, although I can see some slow down in turns.
I also tried the roland branch, and it didn't seem to do any acceleration at first, but then after zuzuf fixed it, it works better, but with the issue I mentioned in my previous post.
Looking at the code it looks like the maximum speed of a move is only what you can reach as top speed within a single move. If your move is shorter than its acceleration and deceleration ramp, it can't reach target speed, even if not actually accelerating due to look-ahead.
Getting this working is a bit tricky, as you'd have to accelerate or decelerate through more than one move. This means adjusting corner speed more than once, which isn't implemented, yet. Having longer curve segments should help.
I just moved the first of the commits from the roland branch to the Gen7 branch after adjusting whitespace, adjusting for C89 and carefully checking it builds with Arduino. Thanks for the implementation, Roland.
Just pushed a new roland branch, the previously three commits made into one and reviewed for C89.
Performance tests are not enthusiastic, though. Without look-ahead I can reach 710 mm/min on the Gen7 branch, 560 mm/min on the roland branch. Both times limited by computing performance before steps become jaggy at the start of deceleration (that's the point where the most computing power is required). That's a 25% performance loss.
If you want to try yourself, I use these in config.h:
#define STEPS_PER_M_X 1280000
#define ACCELERATION 50.
and then G1 X50 F... and G1 X0 F... with raising F's to test speeds. Exhaustion of the CPU is audible, starting within +- 10 mm/min.
My guess: the reason is using int_inv_sqrt() instead of a division.
That's interesting. The int_inv_sqrt function uses 12bit fixed point precision, reducing precision to 8bit would greatly improve performance but it may not be accurate enough. Also there is a 16bit division in this function but the whole thing was faster than a 32bit division on the platforms I tested, I am wondering if different compiler versions could lead to such differences in performance. I am using avr-gcc 4.7.2.
Regarding look-ahead I didn't change the behavior, it still makes sure it can stop within the next move if things go wrong (not enough time to preprocess the moves or communication error) and also limits the crossing speed in order to obey the jerk limit. The later can result in non zero crossing speed being lower than target speeds, in this case peak speeds are higher than crossing speeds.
I am using avr-gcc 4.7.2.
Same here. How did you measure performance? Do you also use the flags given in Makefile-AVR?
One solution might be to use the old calculation in dda_step() but the new one in dda_start() (which should be in dda_create()). the old move_state.n is always the new (move_state.n * 4) - 3. But before doing this we first have to find out why we measure different performances.
The new int_sqrt() is 8 times faster than the old one. I added test results to the commit comment. Exciting!
Last weekend I picked up the new inv_int_sqrt() based acceleration maths (thanks, @zuzuf) and joined it with the clock based acceleration calculations in the accel_clock branch. The result is beautifully working acceleration which hits decleration and movement end point always on the spot. Code in yet another branch, accel_clock_step. The essentials are in commit 157f870855ac3dfe32a3ad42f095294a10450b66.
Seeing this code working so well I think it should become the default one. Just a few minor things missing:
Neither of these two are a roadblock. Enjoy!
I just milled the first PCB with @zuzufs' acceleration math applied to a clock based calculation. Stuff in the accel_clock_step branch is fine now, including endstop handling. Even ramp length predictions are now reasonable accurate, even matching physics formulas! It works so well I expect this to become the standard strategy before too long. Thanks, @zuzuf, your maths is a great help.
The only drawback is, it doesn't fit look-ahead code yet.
Next steps will be to prepare for non-zero movement start speeds and having this, it can't be far to join look-ahead again.
I just tried to get @Cyberwizzard s version of look-ahead working for me and was assuming some integer overflows due to my unusual high step/mm settings, but a few hours into debugging I have to assume this code doesn't work at all. Jerk calculation can only ever return double the full speed or zero, so it's effectively either a removal of all acceleration ramps or a slowdown to a crawl.
Can never ever work as expected. Also, @zuzuf s code apparently doesn't change this. I guess a few people are kidding on me here.
You are reading the code wrong.
The 1D jerk is based on speed alone: the actual delta (for example: +5 mm or -5mm) for both moves is used to orientate the speed vector: a positive delta will have a positive speed, a negative one will have a negative speed.
After subtracting one from the other we obtain the difference in speed between two moves.
If both are in the same direction (regardless of the actual delta x1 and x2), the jerk will be zero. This means both moves can be done at full speed. In any other scenario, the difference between both speed vectors will depict the amount of force on the mechanics if the two would be joined.
Note though that one of the velocities will be the projected maximum speed of the previous move (which may or may not be the F specified in the G-code) while the other is the desired speed of the next move. If they are the same they get joined, if not, the exit speed of the previous move and entry speed of the next move will be scaled down until the jerk is acceptable.
Thanks for chiming in, @Cyberwizzard.
After subtracting one from the other we obtain the difference in speed between two moves.
This would make sense when looking at the speed of each individual axis, but instead the speed of all axes combined is taken into account. The latter is usually constant, so all jerk is ignored.
If both are in the same direction (regardless of the actual delta x1 and x2), the jerk will be zero.
I can't follow this either. You can move a 89 deg angle without changing axis directions and doing this, jerk is certainly not zero.
There is dda_jerk_size_2d_real() in dda_lookahead.c, which obviously tries to take individual axis speeds into account, but this isn't called anywhere.
Anyways, I've made G-code from very rounded shapes and using this with @zuzuf s rework does now something usable. This gives me the equipment to check wether re-enabling look-ahead in the accel_clock_step branch breaks something.
P.S.: I'm glad the code is there, stepping through something existing is much easier than creating the same from scratch.
This would make sense when looking at the speed of each individual axis, but instead the speed of all axes combined is taken into account. The latter is usually constant, so all jerk is ignored.
If that is indeed the case, it is unintended. However, I think you are still assuming here that both moves are at full speed: something which is not a given as it depends on the length of the move and most moves while extruding will be very short: with most models I tried to air-print I found that a lot of moves are done at half the desired speed. This has to do with the fact that complex arcs will be split up in tons of small segments. As we do a single pass over the queue, we make it better but without doing multiple passes like Marlin does, we will not reach the desired speed. As such, the jerk calculations will not be zero all the time.
Also, they can not be 2*F all the time either, as you first stated because that would mean the current axis reverted direction: a situation which quite correctly predicted a large jerk on the axis (resulting in a full stop between moves).
I can't follow this either. You can move a 89 deg angle without changing axis directions and doing this, jerk is certainly not zero.
This is 1 dimensional: the X axis moves either forward or in reverse, it can't take a 89 degree angle unless something is very wrong :-)
If the XY vector describes a 89 degree turn, one of the axis will have a very small jerk and the other a large jerk: this is described in: MAX(dda_jerk_size_1d(x1,F1,x2,F2),dda_jerk_size_1d(y1,F1,y2,F2));
The real 2D jerk code you are referring to is part of my original code where I tried to combine both X and Y in one go. This requires a square root (which is expensive) and is incorrect; even for machines that have a XY table (I think) as each axis is powered by a different motor.
As each axis can be moved individually, there is no relation between them when determining the jerk for each axis. The only thing that needs to be done is that the jerk is restricted for both X and Y simultaneously, which is done in "dda_jerk_size_2d()" by taking the maximum of both axis.
The only thing I can think of is that the velocity vector is currently not scaled down in proportion for each axis. I can think of a corner case where a 90 degree turn is done from X into Y at a constant high speed where the jerk for X would be calculated as 0; which is obviously not true as the X axis has to do a full stop...
From the top of my head (without much caffeine at this time) I can come up with a floating point solution as I'm not sure how large these numbers will be; which needs to be converted to fixed point:
int dda_jerk_size_2d(int32_t x1, int32_t y1, uint32_t F1, int32_t x2, int32_t y2, uint32_t F2) { float dx = (float)(x1 - x2); float dy = (float)(y1 - y2); float dsum = absf(dx) + absf(dy); // Scale both to the point where dx + dy would be 1.0 dx = dx / dsum; dy = dy / dsum; // Break down F1 and F2 in a x and y component float Fx1 = absf((float)F1 * dx), Fx2 = absf((float)F2 * dx); float Fy1 = absf((float)F1 * dy), Fy2 = absf((float)F2 * dy); // Scale to fixed point range here return MAX(dda_jerk_size_1d(x1,Fx1,x2,Fx2),dda_jerk_size_1d(y1,Fy1,y2,Fy2)); }
Does this make sense?
It sort of makes sense how this is an issue for your new and fast approach: as most machines, including mine, never reach the desired speed for a move, doing 90 degree turns results in a violation of the jerk setting (which explains why it is set quite low; any higher results in violent moves). However, any 2D move up to something like 60-ish degrees is probably joined incorrectly, but still acceptable.
Edit: dsum should use the absolute value of dx and dy in case one of them is negative
This would make sense when looking at the speed of each individual axis, but instead the speed of all axes combined is taken into account. The latter is usually constant, so all jerk is ignored.
If that is indeed the case, it is unintended.
I think this is the biggest part of the problem. Independently I outlined a strategy here: https://github.com/Traumflug/Teacup_Firmware/issues/45 which, at the bottom line, happens to almost match the intention of your / the current code. If we look at each axis individually, we can replace JERK_XY and JERK_E with JERK_X, JERK_Y, JERK_Z and JERK_E. Nice how the pieces of the puzzle start to align!
The reason to call it JERK_XY is because I simplified my code assuming you only want to specify one value for both axis (additionally as I managed both axis at the same time at first, which is wrong). Technically this might not be the case as a heavy table might have more trouble in the speed discrepancies than a print head (or vice versa depending on your machine).
Aside from the glitch in the fact that I did not generate an Fx and Fy (which is actually wrong, the more I think about it, see also the corner-case - phun intended), one severe limitation is that the number of steps for the ramp depends on the number of steps per mm. This means that X and Y need the same number of steps per mm (or m) or lookahead will not work. This also prevents me from joining Z moves.
However, as for most people the Z axis is geared down significantly, you most likely do not want lookahead on Z moves at all. But the scenario of having different steps per mm for X and Y is a lot more likely.
Does your new acceleration algorithm use distances instead of steps? In that case this should no longer be an issue.
Does your new acceleration algorithm use distances instead of steps?
It works with distances (for calculating Fx, Fy, ...), so steps/mm don't matter. It's the same as your algorithm if you reduce it to just two axes and replace the loop with a MAX().
After re-gaining look-ahead compatibility on the accel_clock_step branch I was brave enough to put all the new acceleration strategy right onto the experimental branch. This almost doesn't touch look-ahead code, steppers can run now simply double or triple as fast.
Sorry for the delay.
Same here. How did you measure performance? Do you also use the flags given in Makefile-AVR?
I wrote and built a small test program with the Arduino IDE so I don't really know what flags were used. I will try with the same flags as TeaCup.
Performance tests are not enthusiastic, though. Without look-ahead I can reach 710 mm/min on the Gen7 branch, 560 mm/min on the roland branch.
I noticed that if I compensate the ratio 710/560 (~the ratio of time between steps) with the 0.676 magic factor (cf http://www.embedded.com/design/mcus-processors-and-socs/4006438/Generate-stepper-motor-speed-profiles-in-real-time), the result is surprisingly close to what I get. How did you measure these speeds ?
Hi @zuzuf,
I measure performance simply by running a stepper motor. Exhaustion of processing power usually results in hearable dropouts, often even in motor stops. The line is pretty sharp, as exhaustions usually result in interrupts piling up, which in turn lead to a total breakdown of processing power.
Anyways, this int_inv_sqrt() thing led to an entire different strategy: acceleration maths has moved away from the step interrupt into an interrupt executed every 2 milliseconds. This move was only possible with your code and the result is an about 3 times higher performance due to the much shorter step interrupt. Forget 700 mm/min or 560 mm/min, think 2000 mm/min in the same setup (axis with 1280 steps/mm, a typical Z axis). That's something like 42.000 steps per second.
Code is already on the experimental branch and works beautifully, with or without look-ahead. Your stuff on the roland branch is updated, too (there are more nuggets to lift). The main commit introducing this new strategy is this one: f95e8237491238e602c39845317740b3a4c678ab (without look-ahead at that time).
Roland branch now fully on the experimental branch.
Appears to be solved. If not, please re-open or open a new issue.
I have been experimenting with the LOOKAHEAD feature and found a stability issue in the RAMPING algorithm. It goes quickly wrong with LOOKAHEAD enabled in my tests : after a few non trivial crossing speeds have been computed it produces some extremely slow moves.
I traced back the problem to the move_state.c variable which seems to accumulate errors. Reinitializing the move_state.c and move_state.n variables (or recomputing them from scratch when LOOKAHEAD produces non 0 crossing speeds) before each move solves the problem.
I have been using TeaCup for a while and I never noticed similar issues with regular RAMPING code (maybe the symmetry of the ramps makes errors null in average).