MarlinFirmware / Marlin

Marlin is an optimized firmware for RepRap 3D printers based on the Arduino platform. Many commercial 3D printers come with Marlin installed. Check with your vendor if you need source code for your specific machine.
https://marlinfw.org
GNU General Public License v3.0
16.36k stars 19.26k forks source link

[BUG] 120Hz pulse at the beginning of each move/Classic Jerk not functional (?) #20150

Closed holgin closed 1 year ago

holgin commented 4 years ago

Bug Description

I've been trying to write a small piece of software, which in part reads the STEP/DIR signals generated by Marlin. To properly test that, I bought myself a logic analyzer and I have found something weird.

No matter that are the acceleration, jerk or speed settings, if I have S_CURVE_ACCELERATION or ADAPTIVE_STEP_SMOOTHING enabled or whatever - one, single pulse is always generated 8.33ms (=120Hz) before the rest, "normal" pulses are generated.

Below you can find some pictures: NASS_SCA_1000mms2_10mms_60mms_120Hz_1,157kHz_2 Series of 10mm moves

NASS_SCA_1000mms2_0mms_60mms_120Hz_1,157kHz_1 Closeup of one of those moves

I have tested Jerk set to 0, 4 and 10mm/s with accelerations 300 and 1000 mm/s2.

I'm running Marlin Bugfix from 08.11.2020, but the same thing was happening on 2.0.3.

I did not try it with Junction Deviation as it is not working very well for me (extruder).

My board is a custom one, based on the LPC1768 uC and stepper motor drivers are TMC2208.

On the other hand, I have noticed that the frequency of subsequent pulses does not change with different Jerk values, which is weird, because I would expect them to immediately jump to frequency which results from Steps/mm and Jerk values.

I've talked about this issue with @sjasonsmith on discord and he said he has seen these weird pulses on a STM32 board.

Configuration Files

config_files.zip

Logs

Measurements from the logic analyzer are also attached. You can open them with Saleae Logic 1.2.18 software. Naming of the files: [Adaptive_step_smoothing] [S_Curve_Acceleration] [acceleration] [jerk] [speed] logs.zip

I can do more tests/measurements if needed.

holgin commented 3 years ago

So... no hope of getting into it any time soon? I'm willing to do whatever tests and measurements are needed for proper debugging.

qwewer0 commented 3 years ago

While I don't fully understand it, but I still think this should kept open.

qwewer0 commented 3 years ago

Keeping this open.

qwewer0 commented 3 years ago

Lets keep this open.

holgin commented 3 years ago

From my point of view due to lack of interest this thread can be closed, I switched to Klipper firmware some time ago and I will not be able to test/reproduce anything.

thinkyhead commented 3 years ago

For the sake of every Marlin user who may have the same anomaly and cannot switch to Klipper, hopefully this can be reproduced and get some attention soon from someone with a scope.

Note that posts in this issue queue are callouts to the world at large for someone to lend a hand if they can. It is not an issue of "no interest" in the problem, but simply, due to the large volume of issues posted here, many issues do not get attention for a long time, so it is wise to take advantage of Discord and other forums in recruiting help.

descipher commented 3 years ago

From my point of view due to lack of interest this thread can be closed, I switched to Klipper firmware some time ago and I will not be able to test/reproduce anything.

It would certainly not be from a lack of interest. The number of people with the skills and equipment to resolve this type of anomaly are very finite. In addition to that they would need to be aware of the report.

I happen to have both a LCP1768 and a STM32 based board, I will take a look to see if the odd pulse still exists.

descipher commented 3 years ago

Checked all my STM32 (MKS), LCP1768 (MKS) and ATMega2560 (RAMPS1.4) based boards for this anomaly. The reported issue does not exist on any of them running bugfix-2.0.x (2.0.9.2). With or without S_CURVE_ACCELERATION defined. It is possible that there was some other issue present at the time, regardless of that it cannot be repeated today. Please close.

thisiskeithb commented 3 years ago

Could have been a quirk with this user’s custom board as well.

sjasonsmith commented 3 years ago

I have observed it in the past when I was working on step-related issues, so it wasn’t just a quirk for just that board. That was before they even filed this bug, though. I think closing this acknowledges the reality that (1) nobody has looked at it and likely never will and (2) it apparently isn’t causing problems large enough for anybody to notice.

thisiskeithb commented 3 years ago

I think closing this acknowledges the reality that (1) nobody has looked at it and likely never will

@descipher Looked at it and commented before I closed it, but feel free to reopen if you’re still seeing these issues.

descipher commented 3 years ago

I would not spend to much time here, it was reported that "Classic Jerk" is not working however the issue that was raised is a pulse anomaly when S_CURVE_ACCELERATION is enabled. So which is it? You cant do both at the same time. The reporter is no longer willing to participate so we are conflicted. I saw no issue with S_CURVE_ACCELERATION on or off. Thus it works without any pulse anomaly.

holgin commented 3 years ago

Well call it lack of interest or just lack of resorces, there was no response so I closed this topic back then. This pulse in itself is not a problem, 99.9% of users will never notice it, but it suggests that there is (was) a hidden issue with step generation. I may try to flash new Marlin in the future to check if the issue still exists although the real issue was never identified.

holgin commented 3 years ago

The issue was present no matter if SCURVE or Step Smoothing was enabled or not and the jerk issue I found by accident.

sjasonsmith commented 3 years ago

Sorry @descipher, I didn't mean to minimize your efforts here. I read the replies on my phone and misread the chain a bit.

Suffice it to say that this has sat here for a year with nobody taking action on it until @descipher attempted to reproduce. I know this issue was real in the past, and suspect there is probably still an issue in the right conditions. I also agree that it isn't worth keeping open if you can't readily reproduce it.

descipher commented 3 years ago

No worries, the issue can still exist we just don't know what conditions cause it or if a code change has since cleared it.

descipher commented 3 years ago

Thanks for chiming in @holgin, hopefully the issue is fixed by a code change that has transpired.

descipher commented 3 years ago

Either way, we can look at it further if there is a discovered arrangement that triggers the issue.

ManuelMcLure commented 3 years ago

I would not spend to much time here, it was reported that "Classic Jerk" is not working however the issue that was raised is a pulse anomaly when S_CURVE_ACCELERATION is enabled. So which is it? You cant do both at the same time. The reporter is no longer willing to participate so we are conflicted. I saw no issue with S_CURVE_ACCELERATION on or off. Thus it works without any pulse anomaly.

There is no reason you can't use S_CURVE_ACCELERATION and Classic Jerk together - I have been using this for months. I think you're confusing this with the requirement to set EXPERIMENTAL_SCURVE if you want to use Linear Advance with S_CURVE_ACCELERATION.

descipher commented 3 years ago

I would think that Bezier calculation would be conflicted by Classic Jerk, yes you can enabled them but is it compatible? The stepper process is unaware that Classic Jerk has adjusted the rate and since Bezier is calculated in the stepper process this can conflict with the expected result. Then again it is possible that it corrects it .. lol

ManuelMcLure commented 3 years ago

S_CURVE_ACCELERATION was actually introduced into Marlin (as BEZIER_JERK_CONTROL) a month before Junction Deviation was added.

sjasonsmith commented 3 years ago

I actually reproduced this in the simulator. I'm going to re-open this and see if I can figure out what is going on.

image

descipher commented 3 years ago

Rechecked with a new Hantek LA.

X step pin - ATMEGA2560

image

X step pin - STM32F103VET

image

So this is confirmed on real hardware.

sjasonsmith commented 3 years ago

@descipher if you have ideas to investigate further don’t let me get on your way! I have a bit more time this weekend then will be back to work. I am wondering if this is possibly even correct behavior… it looks odd but maybe that is how jerk/acceleration constraints are met?

descipher commented 3 years ago

The fact that it is present on various hardware/emulation targets means we should be looking in the stepper common code. The probe outputs are from default configs in branch bugfix-2.0.x. I am working on the STM32 maple pwm issue first, no reason we can't all look for what's causing the premature pulse now that we know it's real.

sjasonsmith commented 3 years ago

I've done some more testing with the simulator.

I've found that the "stray" pulse is always the same time before the rest of the pulse train. It is not influenced by speed, acceleration, or jerk. I see it whether or not I have classic jerk enabled.

Here you see the stray pulse followed by the normal acceleration: image

The one situation where I do NOT see this, is when movement is slow enough that no acceleration curve is used: image

If even a small amount of acceleration exists between the first two pulses, the first two are always the same time apart, equating to 1.5 mm/s. image

sjasonsmith commented 3 years ago

Well, I found where it is coming from. Understanding whether it is proper behavior or not is a harder question.

void Planner::calculate_trapezoid_for_block(block_t * const block, const_float_t entry_factor, const_float_t exit_factor) {

  uint32_t initial_rate = CEIL(block->nominal_rate * entry_factor),
           final_rate = CEIL(block->nominal_rate * exit_factor); // (steps per second)

  // Limit minimal step rate (Otherwise the timer will overflow.)
  NOLESS(initial_rate, uint32_t(MINIMAL_STEP_RATE));
  NOLESS(final_rate, uint32_t(MINIMAL_STEP_RATE));

In the stepper interrupt code, the timing between the first two steps is controlled by initial_rate, which in this case is set to MINIMAL_STEP_RATE=120. It then does the expected normal acceleration on all the following steps.

I'm not familiar with the planning code or what influences the entry_factor. In my debugging the entry_factor was really small (something like 0.0075), which caused the calculated initial_rate (5) to be less than the minimum (120).

sjasonsmith commented 3 years ago

I need to figure out whether these calculated acceleration/plateau numbers correctly correspond with this pulse train, or if there is an "off-by-one" error and we should be using our acceleration parameters between the first two pulses:

image

image

Right now I'm thinking that initial_rate shouldn't actually be the rate used between the first and second pulses. It is the "incoming" rate from the previous block, and acceleration should begin immediately from that rate.

descipher commented 3 years ago

I did correct a step calculation issue in my laser PR that still not merged. It related to this in some ways. see this in the current code: https://github.com/MarlinFirmware/Marlin/blob/1874787b6bb840156e4ebb721ae6ee6b6438b97a/Marlin/src/module/planner.cpp#L825 I had to change the deceleration calc or the planned step counts would be be 1 off. https://github.com/MarlinFirmware/Marlin/blob/b8671811056724f4597c6e9d24506eef39aed680/Marlin/src/module/planner.cpp#L830 Here the decel point is not block->decelerate_after = acceleration_steps + plateau_steps it should be block->decelerate_after = decelerate_steps + plateau_steps;.

e.g. with actuals of accel = 10, plateau = 20, decel = 9 we would get a total step count of 39 but when you do the calc based on the existing code its a total step count of 40

sjasonsmith commented 3 years ago

@descipher I just looked at your changes and left a comment on that PR. The new change doesn't seem right to me, but probably work in some cases. We could very well have several "off by one" errors in this code...

sjasonsmith commented 3 years ago

OK, the current behavior is definitely incorrect. The step generation code assumes that we are actually accelerating during that entire 8.3ms between the first two steps, which causes a huge acceleration spike on the third step, followed by an early arrival at our coasting speed.

With the current code, we don't accelerate or decelerate at all between the first two steps, we only maintain the entry speed.

I've modified the code to actually calculate the step timing according to the desired acceleration. Using that I was able to generate plots comparing the current code to my "hopefully more mathematically correct" modifications. I think the charts speak for themselves.

NOTE 1: This is currently only a prototype. I am probably doing too much math in the stepper ISR, and more work will be needed to solve this properly.

NOTE 2: I'm comparing trapezoid acceleration here, but the same problem exists for S-Curve.

NOTE 3: The timing of velocity reaching zero is an estimate. The Pulses stop at the last non-zero velocity, and I've estimated an extra timestamp to allow me to estimate the acceleration as we come to a stop. This is just a guess, and it looks like we still might be stopping a little too abruptly. We may need to start decelerating one step sooner than even my new changes.

image

sjasonsmith commented 3 years ago

The charts above are from a G0 X0.4 F2000 command, using otherwise default settings in the simulator example.

holgin commented 3 years ago

I just wanted to say that I'm happy that this is being worked on and it's good to hear that my initial observations were correct!

descipher commented 3 years ago

Definitely concur with your findings, the issue I see is the first step has already occurred in pulse_phase_isr however block_phase_isr has not calculated the initial_rate interval value on entry so the interval is incorrect for the next step. We need to calculate the interval for step 1 of the block

e.g.

      if (step_events_completed == 1)  // Calculate initial_rate interval timer value
        acceleration_time = calc_timer_interval(current_block->initial_rate, &steps_per_isr);

image

I still see more issues in the output. The deceleration in this capture is not right, checking further.

descipher commented 3 years ago

I have improved the accel to decel scaling. More testing required and there are some additional quirks to address. This is only one target HAL, others need to be examined to validate the various CPU timer configurations behave consistently. image

G0F4000X1 = 80 Steps = 1mm accel = 40 Steps, decel = 39 Steps

sjasonsmith commented 2 years ago

Just a quick update for anyone watching this. @descipher and I have been working together on this and sharing ideas. We have a few ongoing experiments which show promise. There is a lot of intricacy in these bits of code so progress is slow.

So far we have only explored fixing trapezoidal acceleration. Additional work will be needed to apply similar changes to S-Curve.

thinkyhead commented 2 years ago

That initial 120 now labeled as MINIMAL_STEP_RATE has been in Marlin for a very long time, so I'm not surprised if it became obsolete along the way and is now just a vestigial thing. It is so old that its original purpose is just about lost to obscurity.

descipher commented 2 years ago

That initial 120 now labeled as MINIMAL_STEP_RATE has been in Marlin for a very long time, so I'm not surprised if it became obsolete along the way and is now just a vestigial thing. It is so old that its original purpose is just about lost to obscurity.

Based on my tests if the step rate value is too small issues will surface, such as the timer continuously overflows or the accel formulas fail with 0.

thinkyhead commented 1 year ago

Is this issue still relevant to the latest code?

descipher commented 1 year ago

Unfortunately I think it is, will verify this weekend and update.

sjasonsmith commented 1 year ago

This might have been fixed with some linear advance rework last year? Was that @tombrazier?

tombrazier commented 1 year ago

Yes, I think I did fix it without realising this issue had already been raised. I certainly remember seeing it when using the simulator. Looking back, I think it was probably commit https://github.com/MarlinFirmware/Marlin/pull/24533/commits/aaf7d0e9012f0caab23568b301ba150298cf1f98 in PR #24533. I'd have to look at the code again to understand why that fixed it, though.

descipher commented 1 year ago

image

This is fixed, confirmed with LA.

sjasonsmith commented 1 year ago

Thanks for double-checking @descipher. I’ll close it out!

github-actions[bot] commented 1 year ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.