Traumflug / Teacup_Firmware

Firmware for RepRap and other 3D printers
http://forums.reprap.org/read.php?147
GNU General Public License v2.0
312 stars 199 forks source link

Fix for dda.c using wrong rampup_steps values #23

Open Cyberwizzard opened 12 years ago

Cyberwizzard commented 12 years ago

I am working on the movement code in dda.c and I discovered that the rampup_steps value (and rampdown) was completely wrong. After working my way back to where it was calculated, I got stuck on the calculation in the original source.

So I replaced that one with my own, with comments explaining how I got to it (and as such, why its correct).

I'm new to github so I didn't manage to figure out how to send a pull request for that single commit so here is a link to the changeset (note that the line numbers differ from your current master): https://github.com/Cyberwizzard/Teacup_Firmware/commit/4860a1aec953300e3ddb5164c9e13098e562a3f0

Cyberwizzard commented 11 years ago

Status update: I now have fully functional look-ahead for moves on the X-Y plane (with or without extruding during the move). This speeds up printing arcs and can be toggled at runtime. A limitation (resulting from the ramping algorithm, see below) is that both X and Y require the same number of steps per m.

The reason for only supporting X-Y moves is because Teacup has in incorrect implementation of Bresinghams algorithm in combination with the linear (ramping) acceleration code: the ramping code mentions how the use of a single 'c' parameter means the value is fixed and does not have to be recalculated on each move. This is true if every axis would require the same number of steps; and they do not. As a result, the ramping values are based on the X axis usually do not work for the Z axis, which requires a lot more steps.

Additionally, Bresinghams algorithm should step on the axis with the most steps in order to get the correct resolution; currently it looks like this is also fixed on the X axis.

I was working on a solution where I kept track of the leading axis for Bresingham; this will work and results in variable ramping parameters depending on which axis is (or is not) active. This results in a lot of logic which needs to be evaluated for every move. It also feels like I am trying to match up two incompatible things.

As such, perhaps ramping acceleration could be modified: I am aware that acceleration is currently working but even in the code the actual ramp length (number of steps to reach target speed) calculations are commented out as they are "incorrect". And as long as every move can involve different axis, the length of the ramp (in steps, based on the steps per m) can vary.

The problem with this approach is that if a move does not end in zero speed (full stop), the ramping algorithm can not join 2 or move moves together (you would need 2 different 'c' values and probably approximate transitions).

An alternative (and much more attractive imho) is to use a macro to obtain the axis with the highest number of steps per mm, making that the leading axis for the ramping algorithm and Bresingham. Since Teacup should be able to keep up with this resolution, moving that single axis or all at once should result in the same number of interrupts. Combining this with Bresingham should yield the solution that Teacup has almost implemented at the moment.

This would allow step based move joining for all axis (if desired) - like my current look-ahead also uses.

I am aware that you derived a solution where acceleration is not determined in number of steps but in time. While that may work I suspect this will yield an approximation of the ramp length (in fixed point or floating point logic) which may result in minor deviations in the number of steps used for acceleration.

I will try to implement my own suggestion in a branch to test if it is indeed the simple fix I expect it to be and test if the functional behavior is preserved.

Traumflug commented 11 years ago

That's excellent news!

the ramping code mentions how the use of a single 'c' parameter means the value is fixed and does not have to be recalculated on each move.

So far this worked, because every move decelerated to a stop. For non-zero speeds between moves your observation is right, though. It can be fixed by either moving c back into the struct for the individual move or by rewriting c in dda_start().

I am aware that you derived a solution where acceleration is not determined in number of steps but in time. While that may work I suspect this will yield an approximation of the ramp length (in fixed point or floating point logic) which may result in minor deviations in the number of steps used for acceleration.

These days I'm pretty convinced it would be a good idea to move acceleration calculations out of dda_step() into something running repeatedly every one or two milliseconds. This is how EMC2 and GRBL (but not Marlin) do this. In clock.c we have a timer running something every 2 milliseconds already.

The big advantage here is, dda_step() would become much smaller, more importantly, get rid of the time consuming 32-bit division. Teacup can currently do something like 15'000 steps/second, or 1'300 CPU cycles per step. An acceleration calculation every millisecond would have 20'000 CPU cycles available. Plenty of time. At the same time, the maximum possible step rate should at least double.

A smaller advantage is, acceleration calculations can be based on time, not on the previous step:

speed = acceleration * time since movement start value for setTimer() = step time = speed / steps_per_mm

Pretty straightforward.

A drawback would be, at step rates above one step per 2 milliseconds, acceleration becomes stair-stepped. A thing we could well live with, IMHO. Also a bit tricky might be the jump from one move to the next, as this is no longer synchronous with acceleration calculations.

Last not least, these 20'000 CPU cycles would allow to re-calculate the Bresenham values while moving, giving a chance to move real curves/circles/beziers.

All that said, I'm very happy there's a reasonable look-ahead now. Where should I start integrating this into the central Teacup?

triffid commented 11 years ago

Teacup does not explicitly keep track of the leading axis. The axis with the most steps always leads due to the block at https://github.com/triffid/Teacup_Firmware/blob/master/dda.c#L135 where total_steps is calculated.

eg if Z has the most steps, then total_steps == z_delta which causes it to step every cycle because z_counter + total_steps - z_delta is always <= 0. All other axes will have smaller deltas so they'll step less often. And yes, teacup supports moving all 4 axes at once and the bresenham is handled correctly.

Effectively, we create a virtual axis to lead with the same number of steps as the axis with the most steps... If the X axis always leads, then circles would print hilariously poorly.

I can't comment on the acceleration, it was being restructured a lot while my involvement in Teacup was waning. I think the best approach is to have an XY acceleration for the motion vector, then map onto the virtual leading axis for consumption by the ramping algorithm.

Soupcup has some code for applying per-axis acceleration limits correctly however it might result in diagonal moves having higher acceleration, haven't looked at it in quite a while.. tacking an XY vector accel into the mix would sort it

as for the rest, sounds lovely :)

Cyberwizzard commented 11 years ago

I tried looking into the acceleration issue (or rather possible fix like the one I suggested) but lack of time prevented me from making any progress there.

My changes are present in my Teacup fork at https://github.com/Cyberwizzard/Teacup_Firmware .

I cleaned my code up a while back so now most of the logic is in dda_lookahead.* and only some minor adjustments in the the rest of the code is needed. I used M141, M142 and M143 to control the look-head at runtime; perhaps another command code is more suitable but for testing this was fine.

I had some code in there to do emergency stops when for example I triggered end stops during prints or when the lookahead screwed up; most of that is gone although the method to do an emergency stop from sanity tests is still there.

Some of the modifications include the ability to keep track of entry and exit speeds on each segment, some movement information to determine the forces for each move (simply based on directions and speeds) and the ability for the queue to pass the previous move/segment to the look-ahead code. Whenever calculations take too much time for 2 segments (for example when printing tiny segments), the two moves are simply not joined.

I have printed for over 30 hours with lookahead enabled by default and found no issues with the prints I made: curved sections look a bit smoother as the toolhead will now be able to actually print arcs without stopping. Tuning the jerk parameter is of course important: I found that a setting of 5 to 10 yields the best results on my printer.

drf5n commented 11 years ago

A patch for Cyberwizzard's lookahead is attached to http://forums.reprap.org/read.php?147,33082,185673#msg-185673 -- I dropped some of the emergency stop code, the axis tracking, and moved some stuff into the !defined(LOOKUP) compilation.

Traumflug commented 11 years ago

This is basically a copy of http://forums.reprap.org/read.php?147,33082,194490#msg-194490

Finally I found the time to move this lookahead into the Gen7 branch here. It's turned off by default, but can be enabled by just uncommenting the feature in config.h. In case you have an older config.h already, you have to copy a number of #defines in the ACCELERATION section from one of the templates.

Two parts not moved (so far):

My testing showed me a clear improvement. Hard corners are moved by decelerating to zero as before, smooth corners maintain speed. What I'm not so sure about is, wether decelerating partially works, too. My suspicion is, corners are either moved full speed or full stop. Tried this with (very low) ACCELERATION = 10, but this apparently caused overflows; movements go to a crawl. Also, I can't see a difference between JERK = 10 and JERK = 1.

Cyberwizzard commented 11 years ago

The jerk parameter should give a noticeable change, especially on sawtooth profiles: for my printer at 9 it rattles the whole table, at 6 it blazes through without taking the table top with it.

About the overflow: I noticed that as well. I printed a couple of things with thin (1.5 mm or so) walls and something went wrong. During one print I injected the command to turn it off again and after a minute it went back to full speed. I blamed it on the atomic barriers but I guess thats not it.

I fixed a race condition where a move would finish before the look-ahead was done and it missed that the join would become invalid. It could be that or I thought that I might have missed a situation where multiple moves would queue up simultaneously instead of one at a time.

Traumflug commented 11 years ago

In addition to your recent fix of acceleration step calculation I've just put an in-vivo calculation in as part of another commit:

https://github.com/triffid/Teacup_Firmware/commit/0b4e6cb547cb7dbe6dc7f651ba09faa4d2e86484

After ramping up (detected by move_state.c < dda->c_min), it's looked wether the given number of acceleration steps is much longer. If yes, the numbers are cut down to the just found result. As F_start or F_end aren't taken into account, this assumes symmetrical ramps. The detection shouldn't trigger on movements with properly calculated values.

drf5n commented 11 years ago

A bit off-topic:

In porting to non-AVR, I'm confused while tyring to track the interrupt status through the several nested functions' "save_reg=SREG; cli();... SREG=save_reg;" code. It looks like the intention is to protect some critical code within these function, but some cases the calling function was in a critical section, if(save_reg&(1<<7)), and the interrupt must not be re-enabled.

How intricate does this structure need to be?

Dave

On Mar 27, 2013, at 11:11 AM, Traumflug notifications@github.com wrote:

In addition to your recent fix of acceleration step calculation I've just put an in-vivo calculation in as part of another commit:

0b4e6cb

After ramping up (detected by move_state.c < dda->c_min), it's looked wether the given number of acceleration steps is much longer. If yes, the numbers are cut down to the just found result. As F_start or F_end aren't taken into account, this assumes symmetrical ramps. The detection shouldn't trigger on movements with properly calculated values.

— Reply to this email directly or view it on GitHub.

Dr. David Forrest drf@vims.edu 804-684-7900w 757-968-5509h 804-413-2882c Virginia Institute of Marine Science Route 1208, Greate Road PO Box 1346 Gloucester Point, VA, 23062-1346

Traumflug commented 11 years ago

SREG is the status register, where the interrupt lock bit is stored. cli() locks interrupts. Restoring SREG also restores interrupt status. In short: it's a wrapper for operations which need to be atomic.

This MEMORY_BARRIER() thing is something you probably have to care about on other platforms, too. It stops the compiler optimizer from re-shuffling the code. Needed when order of execution matters.

drf5n commented 11 years ago

On Mar 27, 2013, at 9:37 PM, Traumflug notifications@github.com wrote:

SREG is the status register, where the interrupt lock bit is stored. cli() locks interrupts. Restoring SREG also restores interrupt status. In short: it's a wrapper for operations which need to be atomic.

This MEMORY_BARRIER() thing is something you probably have to care about on other platforms, too. It stops the compiler optimizer from re-shuffling the code. Needed when order of execution matters.

Within the ISRs, it seems that the interrupt status bit is already cleared before calling the routine, and the final reti after all the register-popping sets the bit. SREG retores the interrupt status outside of an ISR, but inside it just savess the arithmetic status.

The part that I find confusing is tracing the interrupt status through TIMER1_COMPA_vect() { queue_step() setTimer(), temp_achieved(), dda_step(){setTimer()},next_move()}

On the mk20dx128, I can make the clock timer independent of the stepper timer, but I'm not sure of all the implications.

Dave

Traumflug commented 11 years ago

Within the ISRs, it seems that the interrupt status bit is already cleared before calling the routine, and the final reti after all the register-popping sets the bit.

That's true. Clearing the interrupt flag before returning from the interrupt might give better results. For an example, see STEP_INTERRUPT_INTERRUPTIBLE in config.h and dda.c.

On the ATmega, step and clock interrupts are mostly independent, too. Both run on the same counter, but trigger on different, independent thresholds. Clock is timer 1B, Step is timer 1A. Clock repeats always at the same time, so there are no substantial calculations required, just one line early in the interrupt routine.

Step timing is pretty complex. This timer can count 16 bit in hardware, only, and is extended in software to allow up to 32 bit delays. Also, this timer is prepared for delays down to zero - which isn't physically possible, but is coded to get as close as possible. Delay 0 is executed as soon as possible and the inevitable delay is subtracted from the next delay. This is pretty close to four independent timers and used by ACCELERATION_TEMPORAL.

If your MCU can do 32 bit timers in hardware - great, then you can simplify this code a lot.

Traumflug commented 11 years ago

BTW., I think most other RepRap firmwares set an divider of 8 for the step interrupt, so timing is not entirely as accurate as with Teacup, but the 32 bit extension can be avoided. They then get maximum delays of (1 / F_CPU) * 65535 * 8 = 0.026 s = 26 ms. Translates to a minimum feedrate of 38 steps/second or 288 mm/min on a half-stepping, belt driven axis. With Teacup, I've done successful movements at F0.1 (for EDM) already.

Earlier versions of Teacup had an implementation which set the divider according to the requested delay, see timer.c about two years back.

drf5n commented 11 years ago

So Teacup's step-timing capabilities are 1/F_CPU * 2^32 = 268s by 1/F_CPU=0.625us. That would translate to a comparable minimum feedrate of 0.0037 steps/sec, and with a 0.126mm/step machine: 0.00047 mm/sec or 0.028mm/min.

On the maximum rate side, I guess the 1/F_CPU = 0.625uS timer resolution translates into accuracy of the computation-limited step frequency. With the 32 bit timer, it is possible to do the ~15000 steps/second moves with 0.1% frequency accuracy, while the /8 divider would limit the frequency accuracy to 0.8%.

I'm cross-posting this to the forum: http://forums.reprap.org/read.php?147,33082,195698,page=47#msg-195698