Slow acceleration since #4997 from time to time?

Sebastianv650 commented 7 years ago

Somebody else has the feeling that since #4997 some moves are executed with a very low acceleration rate?

I did my first small real print with this version today, and I had 3-4 moves that sound like they had a very low acceleration. One was the home-X movement after the print. I did some tests with very low accelerations some month ago (values of 100 and so on) to see how it affects print quality, and the sounds today was like during this tests..

Edit: The home move is reproduceable. Moving it over pronterface sounds OK, clicking home X sounds different. Y axis is not affected - strange!!

thinkyhead commented 7 years ago

@Sebastianv650 Yes, I have noticed that. 😞 For example, on short, slower-speed Z movements, the acceleration was very slow, but on normal-speed Z movements the acceleration was normal.

Looking at the changes in that PR I can't see any that would lead to an alteration in acceleration. If you revert parts of that PR and test maybe you can isolate what the cause is. I don't have a machine to play with here unfortunately. Maybe I can just test with a board, though.

Sebastianv650 commented 7 years ago

The slow homing X is the result of an overflow:

const uint32_t comp = max_acceleration_steps_per_s2[AXIS] * block->step_event_count;

The max. acceleration of my X-axis is 3000*100.5 steps/mm = 301500.
The step event count is set to 44924 steps by the home function, which equals 100.5 steps/mm * 298mm (max. pos of X axis) *1.5.
To not overflow comp, my X axis is not allowed to be longer than 4,294,967,295 (max uint32_t) / 301500 /100.5/1.5 = 94mm...

Due to
```cpp
if (accel * block->steps[AXIS] > comp) accel = comp / block->steps[AXIS];

the low, overflowed comp value is now the basis for accel which is 146 instead of 1200mm/s² in my case.

If I move Z for a small amount (1mm), Z axis acceleration is used. If I move it 10mm, travel acceleration is used. This is also due to an overflow of comp during LIMIT_ACCEL(X_AXIS) and LIMIT_ACCEL(Y_AXIS).

I would revert this PR as it leads to unpredictable acceleration rates than can be very small or even much higher than allowed until a solution is found.

Edit: uint32_t accel; //= block->acceleration_steps_per_s2; is unnecesary as we will assign a new value later in any way.

Sebastianv650 commented 7 years ago

@thinkyhead feel free to call it a stupid idea, but why we try to reinvent the wheel with the planner? As far as I know Marlin started with grbl and since this days most of the energy is spent adding and improving the 3D printer related code like multiple extruders, temperature control and much more. grbl doesn't need all that things so they had a lot of time to improve their core functions like the planner. Wouldn't it be much easier to take their functions again and port it to Marlin?

Maybe not everything is compatible in an easy way, but things like acceleration and jerk should be.

If there is no important drawback which I don't see at the moment, I could do some tests..

Roxy-3D commented 7 years ago

const uint32_t comp = max_acceleration_steps_per_s2[AXIS] * block->step_event_count;

The max. acceleration of my X-axis is 3000100.5 steps/mm = 301500. The step event count is set to 44924 steps by the home function, which equals 100.5 steps/mm * 298mm (max. pos of X axis) 1.5. To not overflow comp, my X axis is not allowed to be longer than 4,294,967,295 (max uint32_t) / 301500 /100.5/1.5 = 94mm...

@Sebastianv650 Can we re-order the computations so we don't do the overflow? I took a quick look and an easy way to do that without a re-write isn't obvious to me. But here is a place where re-ordering the calculations might help to solve the problem:

unsigned long acc_st = block->acceleration_steps_per_s2,
x_acc_st = max_acceleration_steps_per_s2[X_AXIS],
y_acc_st = max_acceleration_steps_per_s2[Y_AXIS],
z_acc_st = max_acceleration_steps_per_s2[Z_AXIS],
e_acc_st = max_acceleration_steps_per_s2[E_AXIS],
allsteps = block->step_event_count;
if (x_acc_st < (acc_st * bsx) / allsteps) acc_st = (x_acc_st * allsteps) / bsx;
if (y_acc_st < (acc_st * bsy) / allsteps) acc_st = (y_acc_st * allsteps) / bsy;
if (z_acc_st < (acc_st * bsz) / allsteps) acc_st = (z_acc_st * allsteps) / bsz;

It might be possible to turn it into this construct: (We would have to carefully consider the loss of precision because of the integer math!!!)

if (x_acc_st < (acc_st / allsteps) * bsx)  acc_st = (x_acc_st / bsx) * allsteps ;
if (y_acc_st < (acc_st / allsteps) * bsy)  acc_st = (y_acc_st / bsy) * allsteps;
if (z_acc_st < (acc_st / allsteps) * bsz)  acc_st = (z_acc_st / bsz) * allsteps;

However, there are relatively few places that max_acceleration_steps_per_s2 and max_acceleration_steps_per_s2 are set. Can we cap them such that any calculation using them doesn't exceed 4,294,967,295?

Wouldn't it be much easier to take their functions again and port it to Marlin? Maybe not everything is compatible in an easy way, but things like acceleration and jerk should be. If there is no important drawback which I don't see at the moment, I could do some tests..

Please do.

Sebastianv650 commented 7 years ago

acc_st / allsteps may easily end up as zero on some printers. In the original PR @thinkyhead was linking to (https://github.com/Ultimaker/Marlin/pull/7/files) they use float for the complete calculation. I guess to avoid exactly what we have now.

I think the fastest version that is safe would be the one now in RCBugFix, but put some (float) in all the lines and make comp a float:

#define LIMIT_ACCEL(AXIS) do{ \ const float comp = (float)max_acceleration_steps_per_s2[AXIS] * (float)block->step_event_count; \ if ((float)accel * (float)block->steps[AXIS] > comp) accel = comp / block->steps[AXIS]; \ }while(0)

Blue-Marlin commented 7 years ago

@Sebastianv650 We learned a lot from grbl, but a new reimport of the grbl code into Marlin is difficult because grbl is too good in the things it does. One of grbls targets is to make use of all tricks you can use on an Arduino UNO. We have to support a much wider variety of hardware. We need at least 4 independent stepper channels, grbl supports 3. Grgb needs all step and direction pins to be on the same port. There is no 'printer' board, i know about, what could be configured like that. For the endstops grbl uses pins, capable of requesting interrupts on state change. RAMPS is the only board, i know about, where the designer made this possible. For ending the step pulses grbl makes use of an other timer. No problem for grbl, but we additionally have to support servos, stepper current, heaters, all needing timers. On the 'small' processors, we try to support, no timer is left.

Porting grbl again will cause so many changes that the introduction of lots of errors is likely. Likely looking into grbls code and picking applicable improvements into Marlin is he better way. That's what we try to do. We also occasionally have a look into other open printer firmwares and try to pick what we like. Sadly many good concepts exclude each other.

Sebastianv650 commented 7 years ago

@Blue-Marlin thanks for the explanation, so I can save myself a lot of time.

thinkyhead commented 7 years ago

The slow homing X is the result of an overflow.

@Sebastianv650 Of course I wouldn't be surprised if there's an overflow, but I'm also not sure, because that portion of the code hasn't changed a huge amount from what was previously there. Here's what was there before:

if (max_acceleration_steps_per_s2[AXIS] < (block->acceleration_steps_per_s2 * block->steps[AXIS]) / block->step_event_count)
  block->acceleration_steps_per_s2 = (max_acceleration_steps_per_s2[AXIS] * block->step_event_count) / block->steps[AXIS];

This amounts to the following, which uses accel in place of block->acceleration_steps_per_s2:

const uint32_t comp = max_acceleration_steps_per_s2[AXIS] * block->step_event_count;
if (max_acceleration_steps_per_s2[AXIS] < (accel * block->steps[AXIS]) / block->step_event_count)
    accel = comp / block->steps[AXIS];

The next idea I had was to move the division to the other side as a multiplication:

const uint32_t comp = max_acceleration_steps_per_s2[AXIS] * block->step_event_count;
if (max_acceleration_steps_per_s2[AXIS] * block->step_event_count < (accel * block->steps[AXIS]))
    accel = comp / block->steps[AXIS];

…but see, it comes out to the same as comp! So it can simply be reduced to:

const uint32_t comp = max_acceleration_steps_per_s2[AXIS] * block->step_event_count;
if (comp < accel * block->steps[AXIS]) accel = comp / block->steps[AXIS];

thinkyhead commented 7 years ago

So, checking out your numbers, overflow is definitely possible, but shouldn't be too common. With an max_acceleration_steps_per_s2[X_AXIS] of 301500, the 32-bit integer would overflow after going over 14245 step events. An axis with 100.5 steps per mm needs to make a single 141mm move to cause an overflow, but an axis with more steps per unit can throw it off even more. Of course most E axes have high values for acceleration.

Try reverting the code so it doesn't use the updated technique and see if it fixes the slow acceleration issue:

#define LIMIT_ACCEL(AXIS) do{ \
  if (max_acceleration_steps_per_s2[AXIS] < (accel * block->steps[AXIS]) / block->step_event_count)
    accel = (max_acceleration_steps_per_s2[AXIS] * block->step_event_count) / block->steps[AXIS];
}while(0)

Sebastianv650 commented 7 years ago

Of course I already did the revert to the old code, witch removed the slow acceleration problem. But you are also right with your sentence:

"Of course I wouldn't be surprised if there's an overflow, but I'm also not sure, because that portion of the code hasn't changed a huge amount from what was previously there."

Yes, also this old code suffers from overflows - it seems nobody ever did a detailed check with some real values :-D

But in the old style, in the if-phrase we are calculating (block->acceleration_steps_per_s2 * block->steps[X_AXIS]) instead of max_acceleration_steps_per_s2[AXIS] * block->step_event_count. Due to the fact that acceleration_steps_per_s2 is usually much lower than max_acceleration_steps_per_s2, I have no problem in the this line of calculation. But in the next one, where we handle the case that the "if" is true, the problem occurs again: block->acceleration_steps_per_s2 = (max_acceleration_steps_per_s2[AXIS] * block->step_event_count) / block->steps[AXIS] Has a high chance that the multiplication overflows due to unsigned long.

So the difference between the new and the old writing style is: The new one calculates the comp value, which will overflow "very" likely and because we use this to decide if we have to limit accel, the chance is high that the error comes to the print. In the old one, the error will only become visible if we have to limit accel. AND the calculation overflows, which has a lower chance to occur.

So in my opinion, if we want to stick to the actual way of limiting the accel. (and I don't know a better way at the moment), the only safe way is to use float.

thinkyhead commented 7 years ago

the only safe way is to use float.

@Sebastianv650 The problem with float is that if the value goes over what a 31 bit integer can hold, there's some loss of precision. So if we can do most of the maths in integer form, it will also be more accurate. I see that some loss of precision, as long as it's not too much, might not be terrible. Q: Is it worth the extra overhead of float to obtain the extra value range margin at the cost of loss of precision?

There is one other way, which is to add a custom 40-bit integer data type plus some assembly functions to do maths with it. (Rather than jumping up to a full 64 bits.) Binary multiplication in assembly is pretty easy, just shifting and adding. Bit-wise comparison for a 5-byte integer is also very easy.

But that's not for today. We'll cross that bridge as we move towards more integer and/or fixed-point maths.

I've already patched and reverted the code to fix the overflow issue. So we now have the purportedly better jerk and speed junction code without the acceleration bug. So far in my testing it works very well. I see no more acceleration issues, and the print speed, acceleration, jerk, and junction (on a SCARA) all "feel right" to me so far.

Perhaps this also fixes the issue where using 32x micro-stepping on a Z axis at typical homing speeds was causing a watchdog reset. It's a stretch, but maybe the speed numbers were so off that they caused unusually high ISR rates to be produced. Or perhaps we're just hitting the edge of the envelope.

There are more pieces left to integrate from the Prusa i3 MK2 branch. For example, they have an alternative fixed-point version of the trapezoid generator designed to be usable within smaller bed sizes (such as 300x300).

Even if we sanity-check for the smaller coordinate system, it's still easy to cause overflow issues. Just use G92 to set the current position to something exorbitant, and the planner/stepper variables will overflow. (This issue be fixed by always removing the coordinate space offsets when storing coordinates in the planner — and of course always adding them back in when converting the other way.)

thinkyhead commented 7 years ago

the only safe way is to use float.

It also occurs to me that we could do a simple heuristic to decide ahead of time whether to use float maths or unsigned long maths. For lower values, corresponding to lower speeds and shorter moves, we can check whether the intermediates will fit into 32 bits or not, and use the quicker int maths if it's safe to do so.

Roxy-3D commented 7 years ago

The problem with using float is it is going to result in a whole bunch of floating point operations.

Would it really be that hard to put a 40 bit (or BCD fixed point) math package in place? It would seem we could add the library and just use it in a few places like this case without too much of a problem? I think that would be cleaner than trying to detect if we were going to have a problem and then making a choice which calculation path to go down.

This one won't work for us, but something along this line might be interesting: http://www.codeproject.com/Articles/37636/Fixed-Point-Class

But that issue aside, I have a related question: We are only talking about using float for the calculation of acceleration, right? The actual position and step count would still be done as int's, right? I'm worried that the imprecise nature of float might cause us to lose accuracy and have the nozzle's position start to shift over the duration of the print.

Roxy-3D commented 7 years ago

I was thinking. Do we really need all those bits of precision? What if we just sacrifice the lowest 3 or 4 bits of max_acceleration_steps_per_s2[AXIS].

We don't really need the lower bits of precision. Can we just right shift this 4 bits and make sure we compensate for that at a few strategic places as the numbers get crunched? If so, we won't overflow and a few bits of precision on a 32-bit number isn't even going to be noticed.

Sebastianv650 commented 7 years ago

But that issue aside, I have a related question: We are only talking about using float for the calculation of acceleration, right? The actual position and step count would still be done as int's, right?

True, at least this is my approach as long as the code is as it is today. Only the calculation of the specific line is done in float, the result is casted to (unsigned) long again. For bitshifting to stay inside 32 bit integer, that's position 1 on my list for the next tries - could/should be faster than using float. In my opinion, we are calculating some things much more precise than necessary (meaning the end-result is the same). But that's hard to prove for every printer out there if very different steps/mm and print area settings.

I did some tests with the fixed point math Sailfish is using (AVRfix) month ago, but as far as I remeber the limits are easily broken by average printer dimensions. I also remeber a post of one person implenting it in Sailfish, where they explained the main reason for the simulation environment they built was to be sure the upper limits if the fixed point math are never "reached". A lot of work, not sure if the speedup is worth it.

Roxy-3D commented 7 years ago

For bitshifting to stay inside 32 bit integer, that's position 1 on my list for the next tries

In my opinion, we are calculating some things much more precise than necessary (meaning the end-result is the same). But that's hard to prove for every printer out there if very different steps/mm and print area settings.

It looks to me that we are using more bits of precision than necessary in this set of calculations. However, there are other places (like the Auto Bed Leveling) where 8 byte floats really would be helpful. We need more precision there!

But to your point, in these acceleration calculations, the extra bits of precision are just being used to keep the magnitude of the number intact. I don't think the lower bits are actually providing any precision that we need. It will be very interesting to see the results of your experiment when you get a chance to do it!

A lot of work, not sure if the speedup is worth it.

Yes. The 8 Bit AVR's are running out of head room. All of these problems go away when the 32 Bit platforms start getting used. We will get much more CPU power and I believe we get 8 byte double's.

Sebastianv650 commented 7 years ago

A quick calculation lead me to following possible calculation based on @thinkyhead mk2-PR:

#define LIMIT_ACCEL(AXIS) do{ \
  const uint32_t comp = (max_acceleration_steps_per_s2[AXIS] >> 10) * block->step_event_count; \
  if ((accel * block->steps[AXIS]) >> 10 > comp) accel = comp / block->steps[AXIS] << 10; \
}while(0)

I used a max. acceleration of 10.000mm/s² for my calculation, together with 1600steps/mm (my z axis) and 300mm travel. Still fits inside uint32 and error is on the right side of the decimal point. Real test will be tomorow, it's too late now.

thinkyhead commented 7 years ago

I like the bit-shifting idea where we don't need the precision. In the long run, we should use floats wherever performance doesn't matter (e.g., G29), because float gives extra overhead for large values. Then we should explore fixed-point for the Planner and Stepper code.

I wish we had a nice simulation environment, because flashing and testing without a real printer is otherwise pretty tedious, and we can't do things like slow down and observe or go one step at a time. (I know, there is one, but I haven't gotten it working yet because I'm not fully set up to compile SDL-based projects.)

Anyway, even with just a bare board (my only current option), I can still test planner moves within a variety of configurations and see where we start hitting limits. A scientific approach will be essential here.

Sebastianv650 commented 7 years ago

I had a look at Sailfish code because I often find answers there due to the nice amount of comments. They found the same problem with overflow that we have now, and I like their solution. I will finish the code today and create a PR (a clean one this time), let's see if you like it. It's based on your initial PR #4997 and nearly as fast..

boelle commented 7 years ago

@thinkyhead how did you like his PR ?

github-actions[bot] commented 2 years ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

MarlinFirmware / Marlin

Slow acceleration since #4997 from time to time? #5018