Open ryannining opened 6 years ago
The quake algorithm is a bit different:
float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y; // evil floating point bit level hacking
i = 0x5f3759df - ( i >> 1 ); // what the fuck?
y = * ( float * ) &i;
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
return y;
}
Please check also the units teacup is using. All parts take care about overflow. E.g. units for motors are in steps per meter. Not mm. Maybe you have rounding issues.
I try to create mini code to check the ramping code of the teacup, compared to other firmware code.
Looks like you run this in Arduino IDE. This includes running background tasks.
Doing such measurements in SimulAVR is more accurate and repeatable to the CPU clock. It also runs the unmodified firmware binary, so you also get all the optimizations as well. All the required tools are in the repo, you just have to build SimulAVR nearby: http://reprap.org/wiki/Teacup_Firmware#Teacup_in_SimulAVR
That said, small deviations like F99 vs. F100 are accepted inaccuracies. Doesn't matter as long as the print head path keeps being accurate. The printer just runs 1% slower, barely noticeable.
Yea, but i am on windows now, still confuse about the simulavr.
for the quake invsqrt i already test, without the newton method is sufficient for simple ramp calculation.
The test code only compare the heart of the ramp algorithm core. For arduino, i dont know if there is background task. But if there is, the comparison stil valid i think. Because the code is fairly simple.
float stepdiv = 1000000.f / stepmm;
while (totalstep--) {
f = dl + micros();
ta += acx;
if (ta > 1)dl = stepdiv * InvSqrt(ta); else dl = 1000000;
}
and
uint32_t c0p = (uint32_t)(16000000.f / sqrt((float)stepmm * acx / 2000.));
while (totalstep--) {
f = dl + micros();
ta --;
if (ta > 0)dl = uint32_t(c0p * int_inv_sqrt(ta)) >> 13; // else dl = c0p;
}
f = dl + micros();
is just dummy code to prevent compiler over optimize.
stepdiv is the same as c0p? How does your algorithm works?
How do you convert the dl
in your InvSqrt
to the timer value?
A small test in the simulator. Clocks measured in dda.c at the two places where the sqrt is calculated.
IntSqrt
dda->c = (uint32_t)(pgm_read_dword(&c0_P[dda->fast_axis]) *
InvSqrt(dda->n)) >> 1;
FLASH : 18442 bytes
RAM : 1931 bytes
EEPROM : 32 bytes
smooth-curves.gcode statistics:
LED on occurences: 2274.
LED on time minimum: 34 clock cycles.
LED on time maximum: 490 clock cycles.
LED on time average: 36.8971 clock cycles.
dda_clock() in timer_avr.c
LED on occurences: 3001.
LED on time minimum: 22 clock cycles.
LED on time maximum: 997 clock cycles.
LED on time average: 419.318 clock cycles.
int_inv_sqrt
dda->c = (pgm_read_dword(&c0_P[dda->fast_axis]) *
int_inv_sqrt(dda->n)) >> 13;
FLASH : 17896 bytes
RAM : 1931 bytes
EEPROM : 32 bytes
smooth-curves.gcode statistics:
LED on occurences: 2245.
LED on time minimum: 795 clock cycles.
LED on time maximum: 1310 clock cycles.
LED on time average: 826.84 clock cycles.
dda_clock() in timer_avr.c
LED on occurences: 3001.
LED on time minimum: 22 clock cycles.
LED on time maximum: 1568 clock cycles.
LED on time average: 866.64 clock cycles.
Looks pretty good. Unfortunately little bit more flash.
I add a atomic section to ignore the interrupts. Strange that the times increase. Probably some issues with the simulator?
int_inv_sqrt:
LED on occurences: 2214.
LED on time minimum: 934 clock cycles.
LED on time maximum: 951 clock cycles.
LED on time average: 939.778 clock cycles.
InvSqrt:
LED on occurences: 2213.
LED on time minimum: 305 clock cycles.
LED on time maximum: 391 clock cycles.
LED on time average: 346.601 clock cycles.
LED on time minimum: 22 clock cycles.
This smells like something going wrong. 22 clocks isn't sufficient to do something meaningful. And the second picture shows a velocity spike close to the right edge of the picture.
22 clocks makes sense when:
if (dda == NULL)
return;
In the spike are 3 more steps. hmmm...
s.......s.......s.s.s.s.s.......s.......s
wow, simulavr is really great !! i must learn that tools.
stepdiv is the same as c0p? How does your algorithm works?
How do you convert the dl in your InvSqrt to the timer value?
Stepdiv if is = 1000000 us / steppermm to reduce calculation in main loop, if steppermm is 1 then stepdiv is just 1 second (1000000 us)
then dl = delay = stepdiv/sqrt(ta)
ta = value that contain f squared increment by acceleration in loop
using and keeping the F squared is used by grbl and used for back and forward planner. they use to compute ramp length too
#define ramplen(oo,v0,v1,a ,stepmm) oo=(v1*v1-v0*v0)* stepmm/(2*a);
keeping v0 = start F v1 = end F
does the spike because dda->c is not calculated every bresenham step ?
No. Maybe c_min is wrong. I need to check this.
~It is some rounding problems.~
In this spike come some parts together. rampup_steps = 65 rampdown_steps = 66 move_step_no = 65
rampup_steps < move_step_no == False rampdown_steps >= move_step_no == False
So algorithm think that this will cruise and set the dda->c to dda->c_min. But this is wrong in this case.
Fix: https://github.com/Traumflug/Teacup_Firmware/commit/6f46e95b772227e460
The two downspikes in the end is bresenham. Looks more wrong than it is. Just one step less in this move.
complete calculation code
I try to create mini code to check the ramping code of the teacup, compared to other firmware code. And then measure time each step needed.
The c0p and int_inv_sqrt found on theacup dda core are fast, but i found its not reached correct speed. Was i write wrong code ? Indeed fast inverse squareroot is faster and have wrong end F value too (99)
Part of code i learn from teacup dda.c (ramp up from 0 to 100 in 2500 steps)
Result on Arduino Mega
I think using fast inverse square is lots faster (25us vs 67us)