Pausing at bilinear grid boundaries

thinkyhead commented 6 years ago

Continuing from #8573 —

@tcm0116 : My issue is definitely unrelated to this one. Mine is occurring at the bilinear grid boundaries as the printer is coming to a complete stop as it crosses the boundary. It seems to be a function of the amount of change in Z over the grid and the speed of the movement.

coolio986 commented 6 years ago

I have a 600mm x 600mm printer which I just recently turned off UBL (bilinear) because of this issue. While mine didn't come to a complete stop, it did slow down for a brief period then speed back up when crossing boundaries.

When rapid traverse across boundaries the printer would slow, then speed back up. (rapid traverse is 300mm/s)

It also sometimes did this when printing at 100mm/s.

I believe this slow down is a function of z jerk with z acc/dec. I'm using FW version 1.1.3.

In previous versions of the firmware, when this boundary cross would happen, the watchdog would timeout and force the processor to reset.

My 2c.

Roxy-3D commented 6 years ago

I have a 600mm x 600mm printer which I just recently turned off UBL (bilinear) because of this issue.

UBL and (bilinear) are two different bed leveling schemes. Both are mesh bashed. Did you turn off UBL because you had this problem? Both UBL and Bi-Linear usually do a bi-axis interpolation. But they are not the same animal.

tcm0116 commented 6 years ago

My travel speed is more like 120-160. The issue seems to be more pronounced at higher speeds and at larger changes in z at the grid boundary.

UBL and (bilinear) are two different bed leveling schemes. Both are mesh bashed.

I think the issue is with the Planner, and is being exposed by any mesh-based leveling system.

Roxy-3D commented 6 years ago

I think the issue is with the Planner, and is being exposed by any mesh-based leveling system.

I'm printing above those speeds with UBL. I don't see it. But the machine I have doing that has a 20x4 LCD Display and not a graphical display.

tcm0116 commented 6 years ago

@Roxy-3D can you try building a mesh that has a somewhat excessive amount of deviation? For example, put a 1-2mm thick plate on opposite corners of your bed such that the mesh ends up like a hyperbolic paraboloid? Then, command high-speed travel moves across the bed.

thinkyhead commented 6 years ago

I've posted a new feature that might shed more light on the behavior. The PR linked above adds SEGMENT_LEVELED_MOVES option for all three forms of mesh leveling. Set this option to use delta-style segmentation instead of breaking up moves on mesh boundaries. If it also exhibits interesting pauses then it may point towards a common cause.

AnHardt commented 6 years ago

From my perspective a 'slow down' at the mesh lines is duty. Z is changing direction - so the junction has to be stepped at junction speed. Junction speed should be a function of the angle between the connected lines. (sharp angles require a low speed (down to 0) - flat angles can be done with higher speeds) The problem in this issue is - we get much slower junction speeds than we expect, The junction angle is very flat, so we expect a near zero change of speed at the junction. If change of Z-speed is smaller than Z-jerk we do not expect any slowdown at the junction. So what can go wrong?

Too low Z-jerk?
Empty paner buffer - no buffer line to connect with?
Buffer full, but segments in planer buffer are too short. If jerk and acceleration is low it needs to change several buffer lines to reach junction speed. Calculating this can last a while. Eventually longer than it needs to step the lines in the buffer?
Wrong calculation in the planner?
...

Tests: Alter z-jerk. Do you see a difference? Alter acceleration. At Deltas play with DELTA_SEGMENTS_PER_SECOND and or MIN_STEPS_PER_SEGMENT. Test different feedrates.

Roxy-3D commented 6 years ago

can you try building a mesh that has a somewhat excessive amount of deviation? For example, put a 1-2mm thick plate on opposite corners of your bed such that the mesh ends up like a hyperbolic paraboloid? Then, command high-speed travel moves across the bed.

Well... Here is an easier way to do test that... Assuming you have UBL running, do

G29 P0 (zero the mesh)
G29 Q1 T (create a diagonal step across the mesh)
G1 X0 Y100
G1 X200 Y100 (Do you see the pause or slow down at the step?)

For reasonable changes in mesh point values, I don't see any slow down. But in the above example my printer's Z-Axis is not fast enough to allow the X movement to continue at the same speed. I don't see it stop. But I see it slow dramatically for the step.

I suspect @AnHardt is correct:

From my perspective a 'slow down' at the mesh lines is duty. Z is changing direction - so the junction has to be stepped at junction speed. Junction speed should be a function of the angle between the connected lines. (sharp angles require a low speed (down to 0) - flat angles can be done with higher speeds)

And in the above example... You do see that behavior.

thinkyhead commented 6 years ago

We still have that commented-out junction code sitting in the planner that actually does compare the angles. Since (I believe) it includes a couple of sqrt calculations, maybe it should be implemented only for Z, and only on direction changes. Or… maybe all we need is to make the existing logic smarter.

Where's the point in the code where the behavior manifests? And what are examples of numbers fed to that part of the code that cause this?

thinkyhead commented 6 years ago

Somewhere in this loop…

// Now limit the jerk in all axes.
LOOP_XYZE(axis) {
  // Limit an axis. We have to differentiate: coasting, reversal of an axis, full stop.
  float v_exit = previous_speed[axis], v_entry = current_speed[axis];
  if (prev_speed_larger) v_exit *= smaller_speed_factor;
  if (limited) {
    v_exit *= v_factor;
    v_entry *= v_factor;
  }

  // Calculate jerk depending on whether the axis is coasting in the same direction or reversing.
  const float jerk = (v_exit > v_entry)
      ? //                                  coasting             axis reversal
        ( (v_entry > 0.f || v_exit < 0.f) ? (v_exit - v_entry) : max(v_exit, -v_entry) )
      : // v_exit <= v_entry                coasting             axis reversal
        ( (v_entry < 0.f || v_exit > 0.f) ? (v_entry - v_exit) : max(-v_exit, v_entry) );

  if (jerk > max_jerk[axis]) {
    v_factor *= max_jerk[axis] / jerk;
    ++limited;
  }
}

tcm0116 commented 6 years ago

We still have that commented-out junction code sitting in the planner that actually does compare the angles

I wonder if the CORDIC method could be used to optimize the junction code

thinkyhead commented 6 years ago

@tcm0116 AFAIK, the AVR math and trig functions are pretty optimal IEEE implementations. At least, in tests around delta kinematics we couldn't optimize sqrt better than built-in. And

Here's what GRBL is now doing, with just 1 sqrt and no trig: https://github.com/grbl/grbl/blob/master/grbl/planner.c#L326-L405

thinkyhead commented 6 years ago

What's the result if we naively ignore direction change and only consider the plain difference in speed? Presumably if a direction change is occurring the junction speed should already be low, but not necessarily coming to a full stop.

- // Calculate jerk depending on whether the axis is coasting in the same direction or reversing.
- const float jerk = (v_exit > v_entry)
-     ? //                                  coasting             axis reversal
-       ( (v_entry > 0 || v_exit < 0) ? (v_exit - v_entry) : max(v_exit, -v_entry) )
-     : // v_exit <= v_entry                coasting             axis reversal
-       ( (v_entry < 0 || v_exit > 0) ? (v_entry - v_exit) : max(-v_exit, v_entry) );
+ // Calculate jerk as a simple change in speed, regardless of direction change
+ const float jerk = v_exit > v_entry ? v_exit - v_entry : v_entry - v_exit; // i.e., FABS(v_exit - v_entry)

Will this have undesired consequences? It might lead to fewer instances of BLOCK_BIT_START_FROM_FULL_HALT being set.

Roxy-3D commented 6 years ago

At least, in tests around delta kinematics we couldn't optimize sqrt better than built-in. And

Agreed. But the difference is we only need a few digits of precision for the square root. And especially if we make sure the final calculations always round down the resulting speed.

thinkyhead commented 6 years ago

An integer fixed-point square root might also be faster… http://web.archive.org/web/20080303101624/http://c.snippets.org/snip_lister.php?fname=isqrt.c

Roxy-3D commented 6 years ago

Yeah!!!! That looks like the ticket! I changed the code some what to play with it...

#include "stdafx.h"
#include <conio.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "snipmath.h"

#define BITSPERLONG 32
#define TOP2BITS(x) ((x & (3L << (BITSPERLONG-2))) >> (BITSPERLONG-2))

void usqrt(unsigned long x, struct int_sqrt *q)
{
    unsigned long a = 0L;                   /* accumulator      */
    unsigned long r = 0L;                   /* remainder        */
    unsigned long e = 0L;                   /* trial product    */

    int i;

    for (i = 0; i < BITSPERLONG; i++)   /* NOTE 1 */
    {
        r = (r << 2) + TOP2BITS(x); x <<= 2; /* NOTE 2 */
        a <<= 1;
        e = (a << 1) + 1;
        if (r >= e)
        {
            r -= e;
            a++;
        }
    }
//  memcpy(q, &a, sizeof(long));
    q->sqrt = a >> 16;
    q->frac = a & 0xffff;
}

int main(void)
{
    int i;
    unsigned long l = 0x3fed0169L;
    struct int_sqrt q;

    for (i = 0; i < 150; ++i)
    {
        usqrt(i, &q);
        printf("sqrt(%3d) = %2d, remainder = %2d\n",
            i, q.sqrt, q.frac);
    }
    usqrt(l, &q);
    printf("\nsqrt(%lX) = %X, remainder = %X\n", l, q.sqrt, q.frac);

    getch();
    return 0;
}

sqrt(118) = 10, remainder = 56543 sqrt(119) = 10, remainder = 59553 sqrt(120) = 10, remainder = 62550 sqrt(121) = 11, remainder = 0 sqrt(122) = 11, remainder = 2972 sqrt(123) = 11, remainder = 5933 sqrt(124) = 11, remainder = 8882 sqrt(125) = 11, remainder = 11818 sqrt(126) = 11, remainder = 14743 sqrt(127) = 11, remainder = 17657 sqrt(128) = 11, remainder = 20559 sqrt(129) = 11, remainder = 23449 sqrt(130) = 11, remainder = 26329 sqrt(131) = 11, remainder = 29197 sqrt(132) = 11, remainder = 32055 sqrt(133) = 11, remainder = 34902 sqrt(134) = 11, remainder = 37738 sqrt(135) = 11, remainder = 40563 sqrt(136) = 11, remainder = 43378 sqrt(137) = 11, remainder = 46183 sqrt(138) = 11, remainder = 48977 sqrt(139) = 11, remainder = 51762 sqrt(140) = 11, remainder = 54536 sqrt(141) = 11, remainder = 57300 sqrt(142) = 11, remainder = 60055 sqrt(143) = 11, remainder = 62800 sqrt(144) = 12, remainder = 0 sqrt(145) = 12, remainder = 2725 sqrt(146) = 12, remainder = 5442 sqrt(147) = 12, remainder = 8149 sqrt(148) = 12, remainder = 10847

Bob-the-Kuhn commented 6 years ago

I have not been able to reproduce this issue. Could someone post a configuration & a gcode file that has this issue?

Roxy-3D commented 6 years ago

Bob... I can't reproduce the exact issue either. But this will cause something very similar to happen: Assuming you have UBL running, do

G29 P0 (zero the mesh)
G29 Q1 T (create a diagonal step across the mesh)
G1 Z.5
G1 X0 Y100
G1 X200 Y100 (Do you see the pause or slow down at the step?)

thinkyhead commented 6 years ago

Could someone post a configuration & a gcode file

I'm assuming that the cause is something like Andreas proposed, where the change in Z direction at a mesh boundary is causing BLOCK_BIT_START_FROM_FULL_HALT to get set. I don't think it's a result of the planner getting starved or blocked.

Presumably to get this to manifest you have to first set up a mesh that has a high point between two low points (or vice-versa). Then, with leveling enabled, run some G-code that does moves crossing that line several times from several different points and angles, trying different feedrates.

I'd be curious to know whether the issue is more common when crossing in a straight line vs diagonally, or if it occurs more when crossing near confluences of the grid lines. Also, whether it mostly affects shorter moves (i.e., moves that start closer to the grid line).

@Roxy-3D The G29 Q1 test mesh is probably more extreme (1cm) than the usual case. What if the diagonal line is only 1-2mm offset instead of 9.99?

thinkyhead commented 6 years ago

Yeah!!!! That looks like the ticket! I changed the code some what to play with it...

Great! Next we should compare speeds to sqrt running on AVR. It might end up being possible to do delta inverse kinematics entirely in terms of integer/steps.

tcm0116 commented 6 years ago

I did some testing using a fairly recent version of bugfix-1.1.x, and have some interesting results.

Here's the grid generated by G29 using bilinear leveling:

Bilinear Leveling Grid:
      0      1      2
 0 +0.188 -0.057 -0.042
 1 +0.415 +0.060 -0.160
 2 +0.470 +0.078 -0.225
 3 +0.690 -0.002 -0.547

With leveling active, regardless of X, I experienced stops during the following movement commands at the maximum feedrate of 160 mm/s. What's interesting is that I could get it to stop at 95mm and 175mm during shorter movements, but it only stopped at 175mm during long movements. Another interesting finding is that even with ENABLE_LEVELING_FADE_HEIGHT and the fade height set to 10mm, I stops were still happening with Z=10mm and Z=50mm. I even tried to set the Z jerk to 20, but it still stops.

Y90 -> Y100 - stops around 95mm Y170 -> Y180 - stops around 175mm Y0 -> Y270 - stops around 175mm

Here's my configuration file: Configuration.zip

I'll try and find some time to see if BLOCK_BIT_START_FROM_FULL_HALT is getting set, but it might not be for a few days.

tcm0116 commented 6 years ago

Quick update. I added some printouts for when BLOCK_BIT_START_FROM_FULL_HALT is being set. During the movements described above, it it being set twice for each movement in the else condition of if (moves_queued > 1 && previous_nominal_speed > 0.0001), meaning that we're not even computing the jerk. It looks like there might be an issue with previous_nominal_speed.

thinkyhead commented 6 years ago

Another interesting finding is that even with ENABLE_LEVELING_FADE_HEIGHT and the fade height set to 10mm, I stops were still happening with Z=10mm and Z=50mm.

With today's patches, above the fade height there will no longer be segmented movement…

thinkyhead commented 6 years ago

Sounds like previous_nominal_speed is falling below 0.0001 somehow. With today's patches, it checks for a value below 0.000001 (closer to zero) instead. I wonder if that will make any difference.

Of course, it could also be due to moves_queued being only 1. It seems that once BLOCK_BIT_START_FROM_FULL_HALT has been set, it doesn't get cleared. So I assume it means that the block being prepared is too late to do anything about the exit speed of the previous block.

Bob-the-Kuhn commented 6 years ago

I didn't see anything unusual with Roxy's setup. The only time I saw the X step rate change was when the Z axis stepping changed (as expected).

Here's the Seleae logic analyzer capture. You can load this into the Seleae software if you want to see the details. UBL crossing diagonal.logicdata.zip

Channels: X step (top) Z step Z direction stepper ISR (toggles each time it's entered) check for new block

X when Z step starts

X when Z direction changes

tcm0116 commented 6 years ago

I was incorrect about previous_nominal_speed. It's actually the moves_queued > 1 check. I think this needs to be changed to moves_queued > 0 because moves_queued is set after the current move has been removed from the queue, so checking for > 1 would actually mean that there are two movements queued after the current one. In the case of a move that consists of two segments only, this will result in BLOCK_BIT_START_FROM_FULL_HALT being set for both segments.

I've done a little testing with moves_queued > 0, and it does fix the issue of BLOCK_BIT_START_FROM_FULL_HALT being set for both movements. However, I'm still getting a stop between the segments, and it's somewhat violent. From what I can tell, safe_speed is limiting the final_rate of the first movement, which causes it to come close to a stop, but the second movement's entry_speed is at full speed. As such, it's decelerating at the end of the first movement and then trying to instantaneously reach full speed for the second movement. I'm going to try and investigate a bit further, but I'm almost out of time for the evening.

tcm0116 commented 6 years ago

Hmmmm. I think the second issue I ran into is because Planner::reverse_pass() only runs if movesplanned() > 3. Without the reverse pass, I don't think the entry and exit speeds will be harmonized.

I think I've just uncovered a flaw in the planner design when a movement consists of only 2 segments.

Roxy-3D commented 6 years ago

So I assume it means that the block being prepared is too late to do anything about the exit speed of the previous block.

I could be wrong. But all of the mesh based bed leveling systems have a stutter when you give them a big first move. The problem is any big move gets broken up into smaller segments. Any subsequent moves that get sent to the planner can have the calculations done to combine things. But the move in process can't be modified. And that causes a stutter...

thinkyhead commented 6 years ago

So maybe if we could give the planner a cue ahead of a set of moves that it should wait for one or two more blocks to arrive before it hands the first block over to the stepper ISR. Maybe as a general policy, if the planner is down to a single move, it should not allow it to go to the stepper ISR for a fraction of a second…

AnHardt commented 6 years ago

About Bobs diagrams (https://github.com/MarlinFirmware/Marlin/issues/8595#issuecomment-348073135):

What do we see? Let's begin with the second one. We see a deceleration to jerk-speed (stop), change of direction, followed by an acceleration from jerk-speed (stop). Acceleration/deceleration and jerk are the same on both sides of the stop - i assume determined by Z. In the first diagram we see about the same. The differences are: In the first half we have no Z-steps. There is no change in direction of Z, but it begins to move. So there is an angle and deceleration to junction-speed is required. Junction-speed is z-jerk because z starts from 0. In the first halve X determines jerk and deceleration - in the second half Z.

All is well here - as expected!

AnHardt commented 6 years ago

About the 'fist move in planner buffer can't be connected' problem, @tcm0116 is talking about. An old problem what we knowed about, but forgot. Different strategies have been thought about:

Wait until buffer is full. -> Will not print single moves.
When buffer empty, start delayed. -> Pauses in every 'blocking_move', really bad stuttering and slowdown when buffer runs dry.

A new idea could be:

When buffer empty split the first move in half. -> Unknown consequences! (But maybe.)

AnHardt commented 6 years ago

Here we have a Z-slow-down. speedchange Speed change in X seems to be a tiny bit less dramatic. Here again X seems to dominate acceleration and jerk in the right part.

For reference the other cases again. startz changezdir

tcm0116 commented 6 years ago

Why does there need to be 4 moves planned before Planner::reverse_pass() will run? Since we're only really concerned with the junction velocity of two connected movements, shouldn't you only need 2 moves planned?

AnHardt commented 6 years ago

I think: First move is planned and immediately executed by the stepper interrupt - so blocked for further changes. While block one is stepping, the next line is produced. With only one block we can't work with - there is no optimisation possible. If block nr. 3 is produced and block one is still running, we now have 2 blocks (one junction) to optimize - and we should do that.

AnHardt commented 6 years ago

First block:
| _ |
|/ \|
or
|/\|

Add second block:
| _ | _ |
|/ \|/ \|
or
|/\|/\|

Add third block
| _ | _ | _ |
|/ \|/ \|/ \|
or
|/\|/\|/\|

Now we can optimize to
| _ | __|__ |
|/ \|/  |  \|
or
|  | /|\ |
|/\|/ | \|

tcm0116 commented 6 years ago

So, it sounds like this issue is independent of bed leveling. I suspect you could replicate it with bed leveling disabled by sending two G1 commands in rapid succession.

What about delaying the first movement after a stop for just long enough to let any additional movements be planned, if any?

AnHardt commented 6 years ago

See https://github.com/MarlinFirmware/Marlin/issues/8595#issuecomment-348138944

tcm0116 commented 6 years ago

So then, what if a flag was provided to buffer_line indicating that more moves are pending (such as during a segmented move), which would then delay the first move? In that case, the delay would only need to be until the after the third move is buffered, which shouldn't be very long. This would resolve the stuttering issue if the buffer runs dry since non-segmented G0/G1 movements wouldn't be delayed.

tcm0116 commented 6 years ago

Although that doesn't directly solve my problem where only two segments are executed. For that to work, Planner::reverse_pass() would be to run after the second segment is buffered, and then the fist segment could be released for execution.

AnHardt commented 6 years ago

The first block problem is a heritage from grbl - has been there for ever. It just hasn't been that visible. Wenn sending a move manually, it's hard to send a further move fast enough. When a g-code file begins after a break it usually sends long straits where we expect a slowdown at the end, or it begins with little moves, where the nozzle after the first move hasn't picked up enough speed to make the slowdown noticeable. New with this grid-levelling is a near full stop from full speed at a place you can't see/expect an angle.

AnHardt commented 6 years ago

I suppose delaying the execution of the first buffer line until the second one is planned (not a fixed time), in combination with splitting the first move into two parts (when the buffer is empty) can solve the problem without disturbing delays. The second move could immediately be optimized with the second part of the first move. This 'just' has to be written and tested.

AnHardt commented 6 years ago

Normal first move:
|/\|
Splitted first move:
|/|\|
Run execution.
Add second move:
|/|\|/\|
optimize:
| |_|_ |
|/| | \|
or, if not at full speed:
| |/|\ |
|/| | \|

AnHardt commented 6 years ago

An open question is what to do if the second move arrives late - the first half of the first move is already used up.

|\|/\|

Will result in a stop. EDITED: What is acceptable. Making pauses in between sending moves justifies breaks in the movement.

tcm0116 commented 6 years ago

I don't think there's anything that can be done about that. In my opinion, we should only perform this optimization for segmented movements were we know that enough segments will be planned in rapid succession, and not unsegmented moves.

AnHardt commented 6 years ago

From the conceptual standpoint it's advisable to clean that problem out at the deepest possible level. Else you have to deal with it again and again in every new segmented move.

Roxy-3D commented 6 years ago

A new idea could be: When buffer empty split the first move in half. -> Unknown consequences! (But maybe.)

This is a very clever idea...

I suppose delaying the execution of the first buffer line until the second one is planned (not a fixed time), in combination with splitting the first move into two parts (when the buffer is empty) can solve the problem without disturbing delays. The second move could immediately be optimized with the second part of the first move.

Yes. But here is small modification to the idea. As the very first line to Planner::buffer_line() we do this:

void Planner::_buffer_line(const float &a, const float &b, const float &c, const float &e, float fr_mm_s, const uint8_t extruder, bool no_break_up) {
  if (!blocks_queued() && !no_break_up) {
     float aa, bb, cc, ee;
      aa = (a+current_position[X_AXIS])/2.0;
      bb = (b+current_position[Y_AXIS])/2.0;
      cc = (c+current_position[Z_AXIS])/2.0;
      ee = (e+current_position[E_AXIS])/2.0;
      _buffer_line(aa, bb, cc, ee, fr_mm_s, extruder, 1);
  }

That will cause a 'small delay' as the first half of the line is scheduled. And we may be able to use the no_break_up flag to indicate there is another 1/2 of the move coming down soon, so don't schedule this yet until it can be joined to the next move.

tcm0116 commented 6 years ago

We'd also have to pass a flag to _buffer_line to tell it to delay execution of the first segment until after the second segment is buffered and the optimization is performed.

thinkyhead commented 6 years ago

When buffer empty split the first move in half. -> Unknown consequences! (But maybe.) This is a very clever idea... But here is small modification to the idea.

Yes, I like this approach too. Except, split the move into a longer move followed by a shorter move, so it takes more time for the first move to finish and thus gives the planner more time to get a third move to chain to the small second move.

thinkyhead commented 6 years ago

void Planner::_buffer_line(const float &a, const float &b, const float &c, const float &e, float fr_mm_s, const uint8_t extruder, bool no_break_up) {
  if (!blocks_queued() && !no_break_up) {
     float aa, bb, cc, ee;
      aa = (a+current_position[X_AXIS])/2.0;
      bb = (b+current_position[Y_AXIS])/2.0;
      cc = (c+current_position[Z_AXIS])/2.0;
      ee = (e+current_position[E_AXIS])/2.0;
      _buffer_line(aa, bb, cc, ee, fr_mm_s, extruder, 1);
  }

Unfortunately the overhead for this solution would be kind of high. The current_position is not in sync with planner._buffer_line. The move is from the previous planner.position (in steps units) to the new position — after converting to steps units.

So, we need to have a planner._buffer_steps function which contains most of _buffer_line, that takes the new move in steps units. The _buffer_line function will do the conversion to steps and then jump to _buffer_steps. This allows the first move to be split in half and queued in terms of steps with the least amount of added overhead.

tcm0116 commented 6 years ago

Planner has a internal position variable that truly is the current position.

MarlinFirmware / Marlin

Pausing at bilinear grid boundaries #8595