Ramping calculation - Githubissues

ryannining commented 6 years ago

I try to create mini code to check the ramping code of the teacup, compared to other firmware code. And then measure time each step needed.

The c0p and int_inv_sqrt found on theacup dda core are fast, but i found its not reached correct speed. Was i write wrong code ? Indeed fast inverse squareroot is faster and have wrong end F value too (99)

Part of code i learn from teacup dda.c (ramp up from 0 to 100 in 2500 steps)

  long m = micros();
  long ts = totalstep;
  ta = ts;
  uint32_t c0p = (uint32_t)(16000000.f / sqrt((float)stepmm * acx / 2000.));

  while (totalstep--) {
    f = dl + micros(); //dummy to pass optimization
    ta --;
    if (ta > 0)dl = uint32_t(c0p * int_inv_sqrt(ta)) >> 13; // else dl = c0p;
  }
  Serial.println(f - micros());//dummy to pass optimization
  f = 16000000 / dl;
  m = micros() - m;
  Serial.print("Final F:");
  Serial.println(f);

Result on Arduino Mega

Std Sqrt loop:

Totalstep:1250
TA:10000
ACC:200.00
Target F:100.00
128
Final F:100
Time/step (us):76

Fast InvSqrt loop:

Totalstep:1250
TA:10000
ACC:8
Target F:100.00
181
Final F:99
Time/step (us):25

Teacup F loop:

Totalstep:1250
TA:10000
ACC:8
Target F:100.00
58380
Final F:274
Time/step (us):67

I think using fast inverse square is lots faster (25us vs 67us)

float InvSqrt(float x) {

  int32_t* i = (int32_t*)&x;           // store floating-point bits in integer
  *i = 0x5f335a86  - (*i >> 1);    // initial guess for Newton's method
  return x;
}

Wurstnase commented 6 years ago

The quake algorithm is a bit different:

float Q_rsqrt( float number )
{
    long i;
    float x2, y;
    const float threehalfs = 1.5F;

    x2 = number * 0.5F;
    y  = number;
    i  = * ( long * ) &y;                       // evil floating point bit level hacking
    i  = 0x5f3759df - ( i >> 1 );               // what the fuck? 
    y  = * ( float * ) &i;
    y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
//  y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed

    return y;
}

Please check also the units teacup is using. All parts take care about overflow. E.g. units for motors are in steps per meter. Not mm. Maybe you have rounding issues.

Traumflug commented 6 years ago

I try to create mini code to check the ramping code of the teacup, compared to other firmware code.

Looks like you run this in Arduino IDE. This includes running background tasks.

Doing such measurements in SimulAVR is more accurate and repeatable to the CPU clock. It also runs the unmodified firmware binary, so you also get all the optimizations as well. All the required tools are in the repo, you just have to build SimulAVR nearby: http://reprap.org/wiki/Teacup_Firmware#Teacup_in_SimulAVR

That said, small deviations like F99 vs. F100 are accepted inaccuracies. Doesn't matter as long as the print head path keeps being accurate. The printer just runs 1% slower, barely noticeable.

ryannining commented 6 years ago

Yea, but i am on windows now, still confuse about the simulavr.

for the quake invsqrt i already test, without the newton method is sufficient for simple ramp calculation.

The test code only compare the heart of the ramp algorithm core. For arduino, i dont know if there is background task. But if there is, the comparison stil valid i think. Because the code is fairly simple.

float stepdiv = 1000000.f / stepmm;
  while (totalstep--) {

    f = dl + micros();
    ta += acx;
    if (ta > 1)dl = stepdiv * InvSqrt(ta); else dl = 1000000;
  }

and

  uint32_t c0p = (uint32_t)(16000000.f / sqrt((float)stepmm * acx / 2000.));

  while (totalstep--) {
    f = dl + micros();
    ta --;
    if (ta > 0)dl = uint32_t(c0p * int_inv_sqrt(ta)) >> 13; // else dl = c0p;
  }

ryannining commented 6 years ago

f = dl + micros();

is just dummy code to prevent compiler over optimize.

Wurstnase commented 6 years ago

stepdiv is the same as c0p? How does your algorithm works?

How do you convert the dl in your InvSqrt to the timer value?

Wurstnase commented 6 years ago

A small test in the simulator. Clocks measured in dda.c at the two places where the sqrt is calculated.

IntSqrt

dda->c = (uint32_t)(pgm_read_dword(&c0_P[dda->fast_axis]) *
                    InvSqrt(dda->n)) >> 1;

    FLASH  : 18442 bytes
    RAM    :  1931 bytes
    EEPROM :    32 bytes

smooth-curves.gcode statistics:
LED on occurences: 2274.
LED on time minimum: 34 clock cycles.
LED on time maximum: 490 clock cycles.
LED on time average: 36.8971 clock cycles.

dda_clock() in timer_avr.c

LED on occurences: 3001.
LED on time minimum: 22 clock cycles.
LED on time maximum: 997 clock cycles.
LED on time average: 419.318 clock cycles.

invsqrt

int_inv_sqrt

dda->c = (pgm_read_dword(&c0_P[dda->fast_axis]) *
          int_inv_sqrt(dda->n)) >> 13;

    FLASH  : 17896 bytes
    RAM    :  1931 bytes
    EEPROM :    32 bytes

smooth-curves.gcode statistics:
LED on occurences: 2245.
LED on time minimum: 795 clock cycles.
LED on time maximum: 1310 clock cycles.
LED on time average: 826.84 clock cycles.

dda_clock() in timer_avr.c

LED on occurences: 3001.
LED on time minimum: 22 clock cycles.
LED on time maximum: 1568 clock cycles.
LED on time average: 866.64 clock cycles.

int_inv_sqrt

Wurstnase commented 6 years ago

Looks pretty good. Unfortunately little bit more flash.

I add a atomic section to ignore the interrupts. Strange that the times increase. Probably some issues with the simulator?

int_inv_sqrt:

LED on occurences: 2214.
LED on time minimum: 934 clock cycles.
LED on time maximum: 951 clock cycles.
LED on time average: 939.778 clock cycles.

InvSqrt:

LED on occurences: 2213.
LED on time minimum: 305 clock cycles.
LED on time maximum: 391 clock cycles.
LED on time average: 346.601 clock cycles.

Traumflug commented 6 years ago

LED on time minimum: 22 clock cycles.

This smells like something going wrong. 22 clocks isn't sufficient to do something meaningful. And the second picture shows a velocity spike close to the right edge of the picture.

Wurstnase commented 6 years ago

22 clocks makes sense when:

  if (dda == NULL)
    return;

In the spike are 3 more steps. hmmm...

s.......s.......s.s.s.s.s.......s.......s

ryannining commented 6 years ago

wow, simulavr is really great !! i must learn that tools.

ryannining commented 6 years ago

stepdiv is the same as c0p? How does your algorithm works?

How do you convert the dl in your InvSqrt to the timer value?

Stepdiv if is = 1000000 us / steppermm to reduce calculation in main loop, if steppermm is 1 then stepdiv is just 1 second (1000000 us)

then dl = delay = stepdiv/sqrt(ta)

ta = value that contain f squared increment by acceleration in loop

using and keeping the F squared is used by grbl and used for back and forward planner. they use to compute ramp length too

#define ramplen(oo,v0,v1,a ,stepmm) oo=(v1*v1-v0*v0)* stepmm/(2*a);

keeping v0 = start F v1 = end F

ryannining commented 6 years ago

does the spike because dda->c is not calculated every bresenham step ?

Wurstnase commented 6 years ago

No. Maybe c_min is wrong. I need to check this.

Wurstnase commented 6 years ago

~It is some rounding problems.~

In this spike come some parts together. rampup_steps = 65 rampdown_steps = 66 move_step_no = 65

rampup_steps < move_step_no == False rampdown_steps >= move_step_no == False

So algorithm think that this will cruise and set the dda->c to dda->c_min. But this is wrong in this case.

Fix: https://github.com/Traumflug/Teacup_Firmware/commit/6f46e95b772227e460

Wurstnase commented 6 years ago

fix_rampup

The two downspikes in the end is bresenham. Looks more wrong than it is. Just one step less in this move.

Traumflug / Teacup_Firmware

Ramping calculation #296