Pausing at bilinear grid boundaries

thinkyhead commented 6 years ago

Continuing from #8573 —

@tcm0116 : My issue is definitely unrelated to this one. Mine is occurring at the bilinear grid boundaries as the printer is coming to a complete stop as it crosses the boundary. It seems to be a function of the amount of change in Z over the grid and the speed of the movement.

thinkyhead commented 6 years ago

See if #8608 has a positive effect … once it passes Travis, of course.

AnHardt commented 6 years ago

The PRs are still changing to fast. Will have a look in the morning.

For now: Splitting asymmetric feels not good to me. Let's assume an ordinary first move. Not reaching it's target speed, so accelerating until the mid then decelerating to zero (not optimized until now)

|   /\   |
|  /  \  |
| /    \ |
|/      \|
After symmetric splitting it will look like:
| /\ | /\ |
|/  \|/  \|
After optimizing this (can only done with stepper interrup waiting for this and modified optimizer)
|   /|\   |
|  / | \  |
| /  |  \ |
|/   |   \|
An asymmetric (3/4) splitted move would look like:
|  /\  |  |
| /  \ |  |
|/    \|/\|
After optimizing:
|   /\ |  |
|  /  \|  |
| /    |\ |
|/     | \|
Up to now this is looking very similar. But now we should start the stepping and can't
work with the first part anymore. Now we add the next bock, same duration, still not 
reaching target speed.
|   /|\   |   /\   |
|  / | \  |  /  \  |
| /  |  \ | /    \ |
|/   |   \|/      \|
optimized:
|    |   /|\       |
|    |  / | \      |
|    | /  |  \     |
|    |/   |   \    |
|   /|    |    \   |
|  / |    |     \  |
| /  |    |      \ |
|/   |    |       \|
vs. (3/4)
|   /\ |  |   /\   |
|  /  \|  |  /  \  |
| /    |\ | /    \ |
|/     | \|/      \|
optimized:
|      |  | /\     |
|      |  |/  \    |
|   /\ | /|    \   |
|  /  \|/ |     \  |
| /    |  |      \ |
|/     |  |       \|
reaching only 3/4 topspeed.

If stepping does not wait for optimizing the splitted move we'll win nothing. Still (this time the first part of) the first move will end at zero speed.

thinkyhead commented 6 years ago

I've pre-merged the general fixes from #8608 and #8611, and now those PRs only contain the planner first-move patch.

thinkyhead commented 6 years ago

I thought it would work out more like this:

First move is BUSY so can't chain. (Exit speed is 0 or max-jerk.)

|  __  ||  ____||____  |
| /  \ || /    ||    \ |
|/    \||/     ||     \|
  BUSY

So if we add 2 moves...

|  ____||_   |
| /    || \  |
|/     ||  \ |
  BUSY

...the next added move has a chance to modify the 2nd move and chain.

|  ____||____||____  |
| /    ||    ||    \ |
|/     ||    ||     \|
  BUSY

thinkyhead commented 6 years ago

Just made an important change: Don't call stepper.wake_up until both moves have been added.

AnHardt commented 6 years ago

You are also right! The difference is in the the moves. Yours reach the wanted feedrate (long enough) mine do not. Have to get up early. More tomorrow.

AnHardt commented 6 years ago

Now we have to check if recalculate(), especially reverse_pass(), forward_pass(), recalculate_trapezoids() are working on the new extended range of blocks (likely dependent on, if steppers are running or not.)

Else looking better and better.

tcm0116 commented 6 years ago

recalculate() only looks at the BUSY flag, which isn't set until the stepper class gets the block, which shouldn't happen until the steppers wake up. Something that might need to be considered is if the steppers can be woken up for any other reason. It might be a better idea to set a flag in that first block that will indicate that the block shouldn't be processed. The flag could then be set once recalculate() has processed the block.

thinkyhead commented 6 years ago

The difference is in the the moves. Yours reach the wanted feedrate (long enough) mine do not.

One way to "cheat" would be to have the first move reduce its feedrate so that it is guaranteed to take at least 50ms (for example). It would end up only applying to smaller, faster moves.

xyzlen ? min(fr_mm_s, xyzlen << 4) : fr_mm_s; // duration under 1/16th of a second

  // Always split the first move into one longer and one shorter move
  if (!blocks_queued()) {
    constexpr int32_t prop[] = { 3, 4 };
    #define _BETWEEN(A) position[A##_AXIS] + ((target[A##_AXIS] - position[A##_AXIS]) * prop[0]) / prop[1]
    const int32_t between[XYZE] = { _BETWEEN(X), _BETWEEN(Y), _BETWEEN(Z), _BETWEEN(E) },
                  xyzlen = SQRT(
                    sq((between[X_AXIS] - position[X_AXIS]) * steps_to_mm[X_AXIS]) +
                    sq((between[Y_AXIS] - position[Y_AXIS]) * steps_to_mm[Y_AXIS]) +
                    sq((between[Z_AXIS] - position[Z_AXIS]) * steps_to_mm[Z_AXIS])
                  );
    // Make sure this move takes no less than 0.0625s
    _buffer_steps(between, xyzlen ? min(fr_mm_s, xyzlen << 4) : fr_mm_s, extruder);
  }
  _buffer_steps(target, fr_mm_s, extruder);

  stepper.wake_up();

tcm0116 commented 6 years ago

What happens if it's a short move? A short move when the buffer is empty could also be problematic with this split first move concept.

thinkyhead commented 6 years ago

Well the thing that generates stutters is a starved planner. The very problem is that the only remaining move in the buffer finishes too quickly, leading to complete stops between segments on a delta. The buffer isn't supposed to get starved, especially when there are many rapid short moves happening, as on a delta. With deltas, except for very few instances, the planner buffer should be saturated.

The "minimum time" change I'm proposing for testing would not prevent the starvation that led to the buffer getting empty, so there would still be the stutter before the first new move. However, it would cause the re-start to give a little extra time for the planner buffer to gain some blocks and likely reduce stuttering.

As another option, instead of having a pure DONT_PROCESS flag on the blocks, we might add a counter where 0 means go ahead and process, otherwise decrement. This would effectively delay a block for some number of stepper ISRs. But this might be more "stuttery" than having it start earlier but go slightly slower, as above.

The patch above would only affect very short moves —under 1/16th of a second— and only in the case of an already-starved (or empty) planner buffer.

tcm0116 commented 6 years ago

I wonder if we're trying to solve a bigger problem than we really have. This is only really an with high-speed segmented travel movements when starting from a stop. As such, a solution for that problem is all that's needed.

Since it's always with segmented moves, then we don't really need to segment the first movement further. A flag could be passed in the call to buffer_line, to indicate that the line is the first of a series. The planner could go ahead and buffer that movement, and if there are no buffered movements, set the DONT_PROCESS flag, which would prevent it from being provided to the stepper class. As the planner of guaranteed to receive additional segments, the DONT_PROCESS flag can just remain set until the next segment is received and recalculate() has been run. This should provide a very minimal delay without additional stuttering in the event of a starved planner.

Roxy-3D commented 6 years ago

Well the thing that generates stutters is a starved planner. The very problem is that the first move in the buffer finishes too quickly, leading to complete stops between moves.

That is not what I'm seeing. When telling the printer to make a long move when it is idle.... There is a stutter at the first mesh line it crosses. That isn't because the planner is out of moves. It is because the second segment of the long move can't be joined to the first segment.

And in fact... with the Max7219 LED's... You can see the planner has at least a half dozen moves in the queue by the time it gets to the first mesh line.

AnHardt commented 6 years ago

@Roxy-3D Right. Exactly that problem I want to solve here. Against too short buffer-lines we have MIN_STEPS_PER_SEGMENT. Against not having enough "time" in the buffer we try to fight with SLOWDOWN and delayed screen updates. Maybe these concepts can be improved, but now and here I want to solve the other problem as god as possible.

@tcm0116 My thumbsup her was i bit early. reverse_pass() for example is not looking like it could work with only two buffer-lines. if (movesplanned() > 3)

@thinkyhead

Well the thing that generates stutters is a starved planner. ...

Yes. A starved planner is also a reason for stuttering, but this problem here is independent from that. With the current code there is NO possibility to execute a first buffer line not ending at zero speed. Completely independent from the question if this is part of a segmented line or not, if it's short or long, if it's slow or fast. (Did you get the idea at all?)

I'll have a few hours this evening to look more detailed into recalculate().

tcm0116 commented 6 years ago

forward_pass() and recalculate_trapezoids() look like they'll run with only two buffered lines. I'll try to find some time to play with it and see how it really works.

thinkyhead commented 6 years ago

When telling the printer to make a long move when it is idle.... There is a stutter at the first mesh line it crosses. That isn't because the planner is out of moves. It is because the second segment of the long move can't be joined to the first segment.

Right. When the planner is empty (or runs out of moves, becoming empty) the last-executed move always comes to a full-stop. What we're addressing is that this also occurs on the first move whenever the buffer is empty, because the first block gets processed by the stepper ISR before the planner has a chance to chain another block. Once a move goes to the ISR it is unusable to do chaining, and thus the first move in any set of two is likely to stutter.

This isn't something we pay attention to very often, but run this G-code on a Cartesian:

G28 X Y
M420 S0
G4 S1
G1 X10 Y10
G1 X20 Y20

This will very often produce two moves instead of one smooth movement!

The reason we don't notice this during printing is that the planner is saturated most of the time. This is something that I've seen on SCARA robots - a weird "stutter" whenever starting a move, especially from full-extension. It actually happens on deltas, but it is usually imperceptible and we all ignore it because all moves on delta (except Z-only) are very small. But a delta might do it more obviously too:

G28
M420 S0
G4 S1
G1 Z200
G1 Z100

Roxy-3D commented 6 years ago

Would it make sense to add a parameter to the Planner::buffer_line() to tell it not to start the move until told to do so? All of the mesh based systems could, at the end of the movement routine call Planner::buffer_line() and say "Do the buffered moves..."

thinkyhead commented 6 years ago

(Did you get the idea at all?)

My code addresses that specifically. Sorry for bringing up the point about starved buffer, but my patch aims to address starvation also.

thinkyhead commented 6 years ago

If the planner can't do a reverse-pass with fewer than three segments, then I can modify the code to add 3 segments instead of just 2. The Planner::recalculate function runs as part of adding a line to the buffer, so by adding one more segment we should get the reverse_pass "for free."

Roxy-3D commented 6 years ago

Are you saying it can join a line already in motion to the next line that was added later?

thinkyhead commented 6 years ago

By adding two moves before starting the ISR, the first move is chained to the second.

|  ____||_   |
| /    || \  |  stepper.wake_up()
|/     ||  \ |

The second move will not be blocked by the stepper ISR until the first block is done processing. Thus, the main loop has a chance to add a 3rd move, and a 4th, and so on.

|  ____||____||____  |
| /    ||    ||    \ |
|/     ||    ||     \|
  BUSY
|  ____||____||______||____  |
| /    ||    ||      ||    \ |
|/     ||    ||      ||     \|
  BUSY   BUSY

As long as that first move gives enough time for the planner to get another move in it, the print job loop will be able to keep it filled, forward-reverse passes are possible, and the speed will stay constant.

tcm0116 commented 6 years ago

then I can modify the code to add 3 segments instead of just 2

I think to @Roxy-3D's point, I think this should only be done if the move is not already segmented. Given the way delta movements are segmented, a very slow movement will result in very short segments, which is already causing issues. Attempting to then further segment the first segment just won't work. For leveled movements that are already segmented, additional segmentation is unnecessary (unless there aren't enough segments).

thinkyhead commented 6 years ago

Attempting to then further segment the first segment just won't work.

I think it might. The segmentation here is essentially harmless, and the min-time concept should provide just enough "magic" to cure the stuttering that can occur at too-fast delta speeds.

thinkyhead commented 6 years ago

result in very short segments, which is already causing issues

Actually, no, the problem is that any first segment fed to the empty planner comes to a full stop, whether long or short doesn't matter.

As for starvation, that can be caused by "very short segments" being fed into the planner, or the parser/sd/serial being delayed by something like a screen update. It is not related to the size of the segments in the blocks that are fed to the ISR. The issue is entirely at the higher (planner) level (fed from the main loop).

We did a bunch to address starvation by throttling/optimizing graphical display updates, and that's the highest level where starvation can be addressed.

AnHardt commented 6 years ago

@thinkyhead Now i'm convinced you got it.

Splitting into three segments seems to be unnecessary. I hope to be right here. "Next is previous in reverse order" -> confusing!

@@ -259,30 +259,28 @@ void Planner::reverse_pass_kernel(block_t* const current, const block_t *next) {
  * recalculate() needs to go over the current plan twice.
  * Once in reverse and once forward. This implements the reverse pass.
  */
 void Planner::reverse_pass() {

-  if (movesplanned() > 3) {
+  // Make a local copy of block_buffer_tail, because the interrupt can alter it
+  uint8_t tail = block_buffer_tail;

+  if (movesplanned() > 3) {
     block_t* block[3] = { NULL, NULL, NULL };
-
-    // Make a local copy of block_buffer_tail, because the interrupt can alter it
-    // Is a critical section REALLY needed for a single byte change?
-    //CRITICAL_SECTION_START;
-    uint8_t tail = block_buffer_tail;
-    //CRITICAL_SECTION_END
-
     uint8_t b = BLOCK_MOD(block_buffer_head - 3);
     while (b != tail) {
       if (block[0] && TEST(block[0]->flag, BLOCK_BIT_START_FROM_FULL_HALT)) break;
       b = prev_block_index(b);
       block[2] = block[1];
       block[1] = block[0];
       block[0] = &block_buffer[b];
       reverse_pass_kernel(block[1], block[2]);
     }
   }
+  else if ((movesplanned() == 3) && (!TEST((&block_buffer[tail])->flag, BLOCK_BIT_BUSY))) {
+  reverse_pass_kernel(&block_buffer[next_block_index(tail)], &block_buffer[tail]);
+  }
 }
+ // EDITED 

 // The kernel called by recalculate() when scanning the plan from first to last entry.
 void Planner::forward_pass_kernel(const block_t* previous, block_t* const current) {
   if (!previous) return;

block_buffer_head is already increased?

tcm0116 commented 6 years ago

else if ((movesplanned() = 3) && (!TEST(block_buffer[tail]->flag, BLOCK_BIT_BUSY))) {
  reverse_pass_kernel(block_buffer[next_block_index(tail)], block_buffer[tail);
}

This seems a bit dangerous since the movement is in process and could possibly finish before this finishes executing.

thinkyhead commented 6 years ago

I sort of like the idea of having a "hold for more segments" flag, for those instances where we know there will be several segments added. That extra flag would tell Planner::_buffer_line not to split the first move in two, but instead, just don't call stepper.wake_up() unless there are >2 moves in the planner. The flag would be set when adding segments and cleared when done.

I have a patch for that concept which I can post shortly.

Planner::reverse_pass block_buffer_head is already increased?

The head gets incremented after calculate_trapezoid_for_block but before recalculate. The last lines of _buffer_block are…

  calculate_trapezoid_for_block(block, block->entry_speed / block->nominal_speed, safe_speed / block->nominal_speed);
  block_buffer_head = next_buffer_head;
  COPY(position, target);
  recalculate();

This seems a bit dangerous since the movement is in process and could possibly finish before this finishes executing.

It might be, but also it seems like it wouldn't be effective. The tail block is the oldest one in the queue and during most of its lifetime it is BUSY. The very last chance to modify the tail block is when the stepper ISR calls get_current_block to start processing it. The ISR keeps the block BUSY while doing the steps, and finally increments the tail pointer when it finishes it. On the next ISR call it will start to process the new tail. That small space between ISRs is the only window where the tail won't be BUSY.

thinkyhead commented 6 years ago

a "hold for more segments" flag, for those instances where we know there will be several segments added

Here's what that looks like… https://github.com/MarlinFirmware/Marlin/pull/8608/commits/6f5e763f16f38d48767b92936b12da886e44cc76

tcm0116 commented 6 years ago

Why not make doing_segmented_move a parameter to buffer_line that defaults to false? This would eliminate all of the times you have to manually set it to false.

And, as a redundancy to not calling stepper.wake_up(), you could return NULL in get_current_block if (doing_segmented_move && movesplanned() <= 2).

Roxy-3D commented 6 years ago

Why not make doing_segmented_move a parameter to buffer_line that defaults to false? This would eliminate all of the times you have to manually set it to false.

Actually... It does better than that. If that extra parameter defaults to false that allows various pieces of code to control the behavior. As it turns out... only the very last move of UBL's motion routine changes. Everything else would get joined together.

      FINAL_MOVE:
      ...
      if (isnan(z0)) z0 = 0.0;

      planner._buffer_line(end[X_AXIS], end[Y_AXIS], end[Z_AXIS] + z0, end[E_AXIS], feed_rate, extruder);

      if (g26_debug_flag)
        debug_current_and_destination(PSTR("FINAL_MOVE in ubl.line_to_destination()"));

      set_current_from_destination();
      return;
    }

(Actually... there is one other place that changes. If the queue is full in the planner and a line can't be added, we probably want to kick off the move. Otherwise it could get stuck and wait forever.)

AnHardt commented 6 years ago

@tcm0116

else if ((movesplanned() = 3) && (!TEST((&block_buffer[tail])->flag, BLOCK_BIT_BUSY))) {
  reverse_pass_kernel(block_buffer[next_block_index(tail)], block_buffer[tail);
}

This seems a bit dangerous since the movement is in process and could possibly finish before this finishes executing.

Im relative sure !TEST((&block_buffer[tail])->flag, BLOCK_BIT_BUSY)) states "tail is not stepped jet" or with your words "movement is not in progress". And stays that way until we start stepping here:

    _buffer_steps(between, fr_mm_s, extruder); 
  } 
  _buffer_steps(target, fr_mm_s, extruder); 

  stepper.wake_up(); // !!!!!!!!!!!

} // buffer_line()

when we finished planing.

thinkyhead commented 6 years ago

Why not make doing_segmented_move a parameter to buffer_line that defaults to false? This would eliminate all of the times you have to manually set it to false.

Because that adds more overhead than having a global flag. The less any new code impacts performance the better.

thinkyhead commented 6 years ago

Im relative sure !TEST((&block_buffer[tail])->flag, BLOCK_BIT_BUSY)) states "tail is not stepped jet" or with your words "movement is not in progress". And stays that way until we start stepping here:

No, the BUSY flag is set at the moment the block is first grabbed inside the ISR. So, you have to prevent the stepper ISR from being called before you test the BUSY flag and you can only re-enable it when done modifying the block.

/**
 * The current block. NULL if the buffer is empty.
 * This also marks the block as busy.
 * WARNING: Called from Stepper ISR context!
 */
static block_t* get_current_block() {
  if (blocks_queued()) {
    block_t* block = &block_buffer[block_buffer_tail];
    #if ENABLED(ULTRA_LCD)
      block_buffer_runtime_us -= block->segment_time_us; // We can't be sure how long an active block will take, so don't count it.
    #endif
    SBI(block->flag, BLOCK_BIT_BUSY);
    return block;
  }
  else {
    #if ENABLED(ULTRA_LCD)
      clear_block_buffer_runtime(); // paranoia. Buffer is empty now - so reset accumulated time to zero.
    #endif
    return NULL;
  }
}

thinkyhead commented 6 years ago

IOW, it seems like this is needed to prevent the block from being grabbed and made BUSY after the bit is tested, but before reverse_pass_kernel gets called…

else if (movesplanned() == 3) {
  CRITICAL_SECTION_START;
    const uint8_t tail = block_buffer_tail;
    if (!TEST(block_buffer[tail].flag, BLOCK_BIT_BUSY))
      reverse_pass_kernel(block_buffer[next_block_index(tail)], block_buffer[tail]);
  CRITICAL_SECTION_END;
}

I'm not entirely up on how the stepper ISR manages speed through the move, but I think that once it starts processing a block, it could be bad to change the speed ramps… But maybe the block processor is more tolerant?

Or, is reverse_pass_kernel only changing the next block in the queue, making it safe even if the block starts?

thinkyhead commented 6 years ago

I note that the order of the blocks here is reversed from usual…

reverse_pass_kernel(block_buffer[next_block_index(tail)], block_buffer[tail]);

Should it be this instead?

reverse_pass_kernel(block_buffer[tail], block_buffer[next_block_index(tail)]);

I assume that next_block_index(tail) is the next block that would be processed after the next-or-active (tail) block.

Anyway, we shouldn't forget that block_buffer_tail is hard to catch in flight before the tail block goes BUSY. The tail block has a very short non-BUSY lifespan between ISRs, so will reverse_pass definitely be called at the appropriate time? We need 100% determinism. So, to review how the tail pointer works:

When the planner is empty (and the stepper ISR is disabled) block_buffer_tail doesn't change, so it can be processed any time before the ISR wakes up.
The block_buffer_head increments as soon as a block is added to the buffer. This allows the stepper ISR to grab the tail block next time it runs.
The block_buffer_tail remains the same when the stepper ISR calls get_current_block and marks the block BUSY. The current tail block is (almost) always the one being processed.
The block_buffer_tail advances at the end of the ISR when it's done stepping. The tail now points to the next (not yet busy) block. (If there is no next block it will be equal to block_buffer_head).

Can the idea be summed up as: Blocks placed in an empty buffer shouldn't be taken up by the ISR until they've been marked as having all their junctions worked out?

In terms of total processing, it probably would end up a bit less if it were possible to add several blocks to the planner before running the trapezoid generator on the whole group, instead of how it works now where it apparently always does a forward and reverse pass on each new junction. Putting aside the extra processing involved in creating these extra blocks, my little scheme of splitting up the first move into 2-3 smaller blocks does seem to already ("economically") fit the bill.

AnHardt commented 6 years ago

After finding

@@ -1234,11 +1234,11 @@ void Planner::_buffer_steps(const int32_t target[XYZE], float fr_mm_s, const uin
         safe_speed = maxj;
       }
     }
   }

-  if (moves_queued > 1 && !UNEAR_ZERO(previous_nominal_speed)) {
+  if (moves_queued && !UNEAR_ZERO(previous_nominal_speed)) {
     // Estimate a maximum velocity allowed at a joint of two successive segments.
     // If this maximum velocity allowed is lower than the minimum of the entry / exit safe velocities,
     // then the machine is not coasting anymore and the safe entry / exit velocities shall be used.

     // The junction velocity will be shared between successive segments. Limit the junction velocity to their minimum.

and disabling the stepper interrupt for planing the splitted move

@@ -1381,24 +1381,22 @@ void Planner::_buffer_line(const float &a, const float &b, const float &c, const

   // DRYRUN ignores all temperature constraints and assures that the extruder is instantly satisfied
   if (DEBUGGING(DRYRUN))
     position[E_AXIS] = target[E_AXIS];

-  // Always split the first move into one longer and one shorter move
+  // Always split the first move into two
   if (!blocks_queued()) {
-    constexpr int32_t prop[] = { 3, 4 };
-    #define _BETWEEN(A) position[A##_AXIS] + ((target[A##_AXIS] - position[A##_AXIS]) * prop[0]) / prop[1]
-    const int32_t between[XYZE] = { _BETWEEN(X), _BETWEEN(Y), _BETWEEN(Z), _BETWEEN(E) },
-                  xyzlen = SQRT(
-                    sq((between[X_AXIS] - position[X_AXIS]) * steps_to_mm[X_AXIS]) +
-                    sq((between[Y_AXIS] - position[Y_AXIS]) * steps_to_mm[Y_AXIS]) +
-                    sq((between[Z_AXIS] - position[Z_AXIS]) * steps_to_mm[Z_AXIS])
-                  );
-    // Make sure this move takes no less than 0.0625s
-    _buffer_steps(between, xyzlen ? min(fr_mm_s, xyzlen << 4) : fr_mm_s, extruder);
+    DISABLE_STEPPER_DRIVER_INTERRUPT(); // There is no block to step anyway.
+    #define _BETWEEN(A) position[A##_AXIS] + ((target[A##_AXIS] - position[A##_AXIS]) >> 1)
+    const int32_t between[XYZE] = { _BETWEEN(X), _BETWEEN(Y), _BETWEEN(Z), _BETWEEN(E) };
+    _buffer_steps(between, fr_mm_s, extruder);
+    _buffer_steps(target, fr_mm_s, extruder);
+    ENABLE_STEPPER_DRIVER_INTERRUPT();
+  }
+  else {
+    _buffer_steps(target, fr_mm_s, extruder);
   }
-  _buffer_steps(target, fr_mm_s, extruder);

   stepper.wake_up();

 } // _buffer_line()

it started working reliably and as expected - even when my changes in reverse_pass() are reverted.

Grbl added an additional variable where they store the index to where revers- and from where forward- passes and recalculations are meaningful.

thinkyhead commented 6 years ago

After finding …

That first change is interesting! Have we been overly-cautious up to now?

and disabling the stepper interrupt for planing the splitted move

That makes good sense. And now it's simple! Maximizes the chance of having two long-enough moves that they won't be auto-joined or dropped.

Grbl added an additional variable where they store the index to where revers- and from where forward- passes and recalculations are meaningful.

Interesting! Would we gain anything by following their example?

thinkyhead commented 6 years ago

@AnHardt — So, seeing as we fell back to your original idea, and it works well, I have no qualms about dropping the doing_segmented_move flag and assuming that there are no other complications. Essentially, it just needs the extra block, plus permission to chain them, plus a guarantee that the stepper ISR (even in slowest mode) won't steal the first block as soon as it's added. I think we have a winner. And the simpler the better. I will merge the simplified PRs shortly.

thinkyhead commented 6 years ago

Merged the very simplified #8608 and #8611. Please test and see if these help. In the process of following up on this issue (and #8573) some other bugs around the planner were discovered and fixed, old obsolete code was stripped, and some consolidation was done. Hopefully printing results will be better on several fronts, in addition to these fresh patches.

AnHardt commented 6 years ago

Have we been overly-cautious up to now?

No. When we know about having a first block ending at zero speed and knowing we can't alter it, it makes perfect sense to always start the second block at zero speed. For a splitted first move this assumption simply is not valid anymore.

oldmcg commented 6 years ago

When I was measuring performance of UBL delta segmentation on 8-bit CPU I added some code to time each segment being added to planner queue.

Here is output from one run using feedrate of 50mm/s and delta_segments_per_second of 200, using long 70mm linear moves with only 4 direction reversals (real angle changes). The times are time it takes to add a move to the planner, ignoring times when planner is already full (depth 15) and waiting for steppers to catch up.

DeltaSegmentsPerSecond:      200
LastSegmentFeedrate:      50.000mm/s (0.833mm/m)
LastSegmentLength:         0.250mm
TimePerSegmentAtRate:      5.000ms
AvgDeltaKinetmatics:       0.321ms,   2240 samples
AvgBufferLineAtDepth[ 0]:  1.680ms,      2 samples
AvgBufferLineAtDepth[ 1]:  2.240ms,      2 samples
AvgBufferLineAtDepth[ 2]:  2.624ms,      2 samples
AvgBufferLineAtDepth[ 3]:  2.686ms,      2 samples
AvgBufferLineAtDepth[ 4]:  2.644ms,      2 samples
AvgBufferLineAtDepth[ 5]:  3.777ms,     20 samples
AvgBufferLineAtDepth[ 6]:  4.022ms,   1164 samples
AvgBufferLineAtDepth[ 7]:  3.957ms,     93 samples
AvgBufferLineAtDepth[ 8]:  3.978ms,     65 samples
AvgBufferLineAtDepth[ 9]:  4.022ms,     70 samples
AvgBufferLineAtDepth[10]:  4.025ms,     88 samples
AvgBufferLineAtDepth[11]:  4.071ms,     60 samples
AvgBufferLineAtDepth[12]:  4.061ms,     58 samples
AvgBufferLineAtDepth[13]:  4.157ms,     60 samples
AvgBufferLineAtDepth[14]:  4.301ms,    128 samples

AvgBufferLineNotFull:      4.032ms,   1816 samples
AvgBufferLineFull:         5.201ms,    424 samples
FullToNotRatio:            0.233

Note how quickly planner fills up to depth of 6 and then time to queue a move to planner increases to point (around 4ms) where can't really queue moves fast enough to stay ahead of steppers. The delta sqrt computations were frequently blamed, but they only account for around 0.3ms per segment. The real computational overhead is the planner queuing many segmented moves.

For a many-segmented move in a single linear direction, it doesn't make a lot of sense to do angle-based planner calculations. Even on a delta where an X/Y linear move can result in a tower stepper slowly coming to stop and then changing direction, the movement is fluid enough that I suspect all planner angle-changing computations could be simply skipped. Maybe the linear acceleration computations could also be streamlined for segmented moves.

I think adding planner queuing function to queue subsequent segmented moves in same linear direction as previous segment that would skip bulk of planner direction/speed change computations would greatly improve ability for higher segments per second without stuttering.

tcm0116 commented 6 years ago

Even on a delta where an X/Y linear move can result in a tower stepper slowly coming to stop and then changing direction, the movement is fluid enough that I suspect all planner angle-changing computations could be simply skipped.

This is an interesting thought. However, there would still need to be at least some sort of detection of the change in direction for a Delta machine. Otherwise, the planner wouldn't know to decelerate for the change in direction. But, to your point, perhaps on Delta machines, we could replace the jerk computation with a simple check for the change in direction.

oldmcg commented 6 years ago

As the tower approaches the direction change due to linear x/y motion of head, it slows to almost no linear movement before changing direction and then smoothly accelerates the other way. This is with a constant speed X/Y linear motion.

I suspect that the X/Y feedrate necessary to cause a tower direction change to require acceleration limitation would be very large. So much so, that I think it's worth having a compile-time option to ignore completely and try it to see how many segments per second it will go without stuttering, and then see what X/Y feedrate it would take to cause excessive jerk in tower direction change.

Then the limit could be placed on max x/y feedrate instead of recomputing at every single buffered segment.

thinkyhead commented 6 years ago

The real computational overhead is the planner queuing many segmented moves.

The process of adding a single segment does seem to involve a lot of processing! I'm sure there are various opportunities to optimize, especially once we know where the greatest overhead comes from.

Note how quickly planner fills up to depth of 6 and then time to queue a move to planner increases to point (around 4ms)

Interesting. The first move is cheapest at 1.68ms, then the next 4 are around 2.5ms, then the 5th jumps up to 3.7ms, then the rest take around 4ms. I assume a lot of that time is taken up by the trapezoid generator. It looks like there is extra junction scanning, forward and reverse, that keeps costing more until you have about 6 blocks in the planner.

where can't really queue moves fast enough to stay ahead of steppers.

Meaning the individual segments are taking about 4ms each to finish, I guess… Of course this is going to be variable depending on config settings like segments-per-second. If the segments take too little time the planner will get starved, and if they take much longer it starts to limit segments-per-second.

Optimization Opportunities

It would be good to find out, what is the most costly aspect of adding a new move to the planner. Is it the junction handling? If so, are there calculations that can be cached in the blocks so they don't need to be repeated?

The delta sqrt computations … only account for around 0.3ms per segment

If the delta calculations don't cost too much that's good. My guess is that the biggest cost probably comes from scanning through the block buffers, which aren't efficiently cached by the CPU. It might cost less time in the junction calculations if that information was stored in its own buffer separate from the blocks.

I see that every segment added to the planner (with XY movement) does a costly length calculation:

block->millimeters = SQRT(
  #if CORE_IS_XY
    sq(delta_mm[X_HEAD]) + sq(delta_mm[Y_HEAD]) + sq(delta_mm[Z_AXIS])
  #elif CORE_IS_XZ
    sq(delta_mm[X_HEAD]) + sq(delta_mm[Y_AXIS]) + sq(delta_mm[Z_HEAD])
  #elif CORE_IS_YZ
    sq(delta_mm[X_AXIS]) + sq(delta_mm[Y_HEAD]) + sq(delta_mm[Z_HEAD])
  #else
    sq(delta_mm[X_AXIS]) + sq(delta_mm[Y_AXIS]) + sq(delta_mm[Z_AXIS])
  #endif
);

This calculation can be skipped by having the caller include the mm length of the move in the block when it is already known (as it is with segmented moves). It might even make sense to pass in the delta_mm values, since these can be invariant when doing segmented straight lines.

thinkyhead commented 6 years ago

see what X/Y feedrate it would take to cause excessive jerk in tower direction change.

Inertia in the effector has to be taken into account too. Even when the tower carriages are all moving smoothly, fast direction changes in the effector may still lead to excessive jerkiness and some loss of accuracy at the end of the nozzle.

thinkyhead commented 6 years ago

Apparently the changes I committed weren't so safe after all, and are causing weird issues. I'll revert them for now until we figure out how to do this "split first planner move" patch in a way that doesn't lead to any weird side-effects.

thinkyhead commented 6 years ago

One of these changes ~~might solve the issue. Please try each one and see if it helps:~~

- if (moves_queued && !UNEAR_ZERO(previous_nominal_speed)) {
+ if (moves_queued > 1 && !UNEAR_ZERO(previous_nominal_speed)) {

   // Always split the first move in two so it can chain
   if (!blocks_queued()) {
+    previous_nominal_speed = 0;
     DISABLE_STEPPER_DRIVER_INTERRUPT();

-   #define _BETWEEN(A) (position[A##_AXIS] + target[A##_AXIS]) >> 1
+   #define _BETWEEN(A) (position[A##_AXIS] + target[A##_AXIS]) / 2

Since I've reverted the changes in bugfix, you can start with one of these branches:

thinkyhead commented 6 years ago

I've applied patches from @hg42 to those branches, and they are now posted as PRs. They are linked just above this comment.

AnHardt commented 6 years ago

When buffer empty split the first move in half. -> Unknown consequences! (But maybe.)

One of the unknown, or better, not thought about, consequences is splitting every move after a stepper.synchronize(). This might disturb homing and probing when the stop/probe is triggered in the first part and the other one is still in the buffer.. So if that breaks think about this first.

Roxy-3D commented 6 years ago

This might disturb homing and probing. So if that breaks think about this first.

Homing and probing is a 'special case'. In general, there should only be Z-Axis movement on these moves. So perhaps it makes sense to not split those moves??? Or in different words, the actual probe operation can not cross a mesh cell boundary. It is only moving the Z-Axis. So there is no purpose to splitting that movement nor trying to join it with a subsequent movement.

MarlinFirmware / Marlin

Pausing at bilinear grid boundaries #8595

Optimization Opportunities