DeBesten / open9x

Automatically exported from code.google.com/p/open9x
0 stars 0 forks source link

perMain is slow #157

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Which board (stock / gruvin9x / sky9x) are you using?
stock
What is your open9x FW version?
R1563
What is your open9x EEPROM version?

What steps will reproduce the problem?

When I switched from ER9x to Open9x, I feel the servos' response is not as fast 
as before.

So, I used a DSO to monitor PPM output, and use the GEAR switch as a triger 
input to capture the PPM wave to find the delay between GEAR and PPM.

Comparing with ER9x, it does be slower.

Then, I try to use STAT (long pressing DOWN key) to check the g_timeMainMax.

It is ~5ms v.s. ~10ms between Er9x & Open9x.

Then I tried to find out why the perMain() loop is slower.

I add the following code to count delay time each time, i.e.

// 0~12ms counters 
uint16_t g_timeList[12]={0,0,0,0,0,0,0,0,0,0,0,0};

...

    t0 = getTmr16KHz() - t0;
    if (t0 > g_timeMainMax) g_timeMainMax = t0;

{
    uint16_t t = t0/16; // unit convert to ms

    // max idx=11
    if (t >= 11)  // if over 11ms, count them as [11]
        t=11;

    g_timeList[t]++;
}

In statistics_views.cpp, I dump all counts. i.e.

  //lcd_putsLeft(5*FH, STR_FREESTACKMINB);
  //lcd_outdezAtt(14*FW, 5*FH, stack_free(), UNSIGN) ;
  {
      int a;
      for (a=0; a <12; a++) {
          uint16_t row = a/4;
          uint16_t col = a%4;
        lcd_outdezAtt(FW*5*(col+1), (5+row)*FH, g_timeList[a], PREC2);
      }
  }

I found that 90% loops takes about 10ms, only 10% about 3ms.
(max ~10ms)in Open9x

Comparing with ER9x (last CVN checkout) , about 66% within 3ms,
33% within 6ms. (max ~5.5ms)

The model setting is "Simple 4CH template".

I did prefer the overall feeling & code style of Open9x.

If the issue could be solved, Open9x will be perfect for me.

Thank you!

Original issue reported on code.google.com by myfo...@gmail.com on 10 Nov 2012 at 8:12

GoogleCodeExporter commented 8 years ago
gruvin9x is not ARM based!
Have a look to this thread:
http://9xforums.com/forum/viewtopic.php?f=55&t=2049

Original comment by bson...@gmail.com on 16 Nov 2012 at 10:15

GoogleCodeExporter commented 8 years ago
ISR(TIMER5_COMPA_vect, ISR_NOBLOCK) update mix only every 20ms ?

Original comment by myfo...@gmail.com on 16 Nov 2012 at 10:16

GoogleCodeExporter commented 8 years ago
Ok... I'd found other OCR5A initialization within pulses_avr.

Original comment by myfo...@gmail.com on 16 Nov 2012 at 10:25

GoogleCodeExporter commented 8 years ago
doMixerCalculations() within ISR, is race condition possible ?

e.g. deleting / modifying mix setting, even switching to another model,
and interrupted by timer5 ISR

Original comment by myfo...@gmail.com on 16 Nov 2012 at 10:51

GoogleCodeExporter commented 8 years ago
I had a DX6i before and its countdown timer seems to be driven by ISR.
If you switch to another model during the timer is running,
occasionally you mess up its eeprom.

Sadly, I didn't know about it until I got hurt by the spinning propeller.
:-(

Original comment by myfo...@gmail.com on 16 Nov 2012 at 11:02

GoogleCodeExporter commented 8 years ago
Yes race conditions were possible there, but I think I took care of them, have 
a look to where pauseMixerCalculations() / resumeMixerCalculations() are called!

Original comment by bson...@gmail.com on 16 Nov 2012 at 11:15

GoogleCodeExporter commented 8 years ago
We cannot take care race condition well in such a long procedure.

e.g. Just pressing trim button when flying

If value increases from 0x00FF  to 0x0100, while 0x00 (MSB) is changed and IRQ 
occurs, 
the value becomes 0x01FF. 

The same with LSB is changed first, or dword or any data structure, 
the problem is the same.

To solve all cases, lots of critical sections must be created & reset.

Another big issue is stack consumption estimation.

If IRQ raised within a very deep function call, doMixerCalculations() in IRQ 
consumes more! 
(and not predicable, it depends on what / how users programmed the mix)
The worst case is nested IRQ in the mean time.

Both problems make bugs hard to be traced & reported & reproduced.

Normally, we are requested to keep isolation between IRQ & main thread.
Limit the data and functions shared between them.

Use critical section / semaphore if interface is necessary.

---

I'm working on the s/w & f/w for system architecture for tens of years, majored 
in 
telecommunication applications (e.g. telcom switch / cellular phone / GPS / ..).
Also left some code within linux kernel & driver in the very beginning.

But seems I'm too serious. It's just a TX not an missile. :-P

Original comment by myfo...@gmail.com on 16 Nov 2012 at 3:59

GoogleCodeExporter commented 8 years ago
I know, this is the rule, but... I really think that this solution needs to be 
tried (as it has already been implemented on gruvin9x)...

Do you see one missing critical section? Trims? Where do you see a 16bits 
value? The trim value itself? It is set and used in the same mixer interrupt!

BTW It costs 470 extra bytes flash and this is something not wanted on stock 
with M64 CPU!

Original comment by bson...@gmail.com on 16 Nov 2012 at 4:16

GoogleCodeExporter commented 8 years ago
I only use an example to illustrate the problem.
I had found that example is not proper.

The case above is 
  if (s_current_protocol < PROTO_NONE) {
    doMixerCalculations(tmr10ms, tick10ms);
    checkTrims();
  }
Luckily,all within TIMER5.

But the point is the same. My previous example is an example of the 
architecture problem.

Unless I had traced ALL the nested routines, sub-routines and variables used 
within
the IQR, plus other higher priority ISRs, I cannot make sure if something may 
go wrong.

It really is a maintain problem.

How can we ensure every data used by menus, UIs, alerts or interrupts will not
break the rule? :-(

If all ISRs were short & clear. We can easily classify what is shared and 
needs to be protected.

The only problem here we would like to solve is doMix timing.
We is trading single thread reliable with unprotected multi-thread just for 
unnecessary UI redraw.

I think comment 45 should solve the problem.

I had measured
 doMix takes < 3ms
 g_menuStack[g_menuStackPtr](evt) takes about another 3~7ms

Just don't call g_menuStack[g_menuStackPtr](evt) between (setupPulses - 6ms) to 
(setupPulses),
quickly g_menuStack() loops again and be allowed to run on next time.

If PPM width is greater than 10ms, everything is ok.

Original comment by myfo...@gmail.com on 16 Nov 2012 at 5:02

GoogleCodeExporter commented 8 years ago
I fully agree about the maintainability point, but I had no choice on gruvin9x 
board, we have to deal on this board with SD read / write which can be very 
slow, and unless we install an RTOS, I can't see any better solution...

Therefore this rule has to be checked everywhere now, and I thought it would be 
valuable to reuse this work for the stock board as well, do you see?

That's said, now imagine a complex mix, with slow transitions between phases 
(including many mixes), you can have a mixer calculation which is also very 
long > 15ms. It's good to know that we have a solution to have the mixes 
calculation ready just in time before setupPulses are called, don't you think 
so?

But again it costs flash, that's not wanted (unless we create an option for 
that). So I will attach my patch to the issue as a possible candidate, but for 
sure it will remain there ;)

Original comment by bson...@gmail.com on 16 Nov 2012 at 5:12

GoogleCodeExporter commented 8 years ago
I 100% agree with you. I just repeat what books tell me. :-P

You had pointed out another question I didn't think about it, 
what if extreme long mix happens.

The following points are only philosophic questions, not answers:

(1) If mix is extreme slow, then we need a faster MCU.
Comparing with the PPM 18ms, a >15ms mix has no need just before setPulse.
(If you always be late, don't care when to take bus. You're late already.  ^_^)

If insist on mix before pulse, reserve time slot for doMix() in main thread 
still work. 
Just calculate time used on previous call, and prevent from call other routines 
before setupPulse.
Using ISR or main, the problem is the same. tDoMix > tPPM is not allowed.
i.e. If doMix > 18ms, redraw always be interrupted, and will be very very slow.

(2) I had implemented SD (via SPI bus) applications without RTOS before.
If you have enough buffer to cache about 512b * N, you can try to flash it
by blocks arrangement with ISR supported. 

(3) Forgive me, but to be honest, if I need to choose between extra 18ms or 
extreme
long & deep ISR, I prefer extra 18ms. I even don't know how to write test cases 
for it.  :-P

Original comment by myfo...@gmail.com on 16 Nov 2012 at 5:50

GoogleCodeExporter commented 8 years ago
(1) You can be very late if you are not sync-ed!

Agree on the note, it will be worth another patch attached to the issue ;)

(2) I am afraid that we will have to rewrite the FatFS code ... Not me ;)

(3) True. We do have some google tests, but not good for that kind of stuff ...

Original comment by bson...@gmail.com on 16 Nov 2012 at 6:45

GoogleCodeExporter commented 8 years ago
I have  an idea not too complex.

Assume: (all constant will be variable)

tp = estimate next setupPulse happens (updated by ISR)
duration_draw = 4 (assume 4ms, confirmed already)
duration_mix = Dm (calculated by last mix)

for (;;) {
 if (t + 5 >= tp) {
   doMix();
   doRedraw(); // after doMix, it is early enough to draw
 }
}

The idea is, we do mix as close as t, then redraw screen immediately.
Other time do nothing, just wait. (More redraw or mix is useless)

It helps to sync with PPM.

1) if 4 + Dm < 18ms, we will get 1 mix & 1 redraw per 18ms.

2) if 4 + Dm > 18ms, we still get 1:1 mix & redraw. the mix is as close PPM as 
possible. Cannot be better.

Original comment by myfo...@gmail.com on 16 Nov 2012 at 8:08

GoogleCodeExporter commented 8 years ago
sorry typo

for (;;) {
 if (t + Dm + 1 >= tp) {
   doMix();
   doRedraw(); // after doMix, it is early enough to draw
 }
}

Original comment by myfo...@gmail.com on 16 Nov 2012 at 8:13

GoogleCodeExporter commented 8 years ago
Perfect, except that duration_draw is not constant at all!

Original comment by bson...@gmail.com on 16 Nov 2012 at 8:22

GoogleCodeExporter commented 8 years ago
It 's not so important.

duration_draw is not used.

Just keep calling it as early as possible after doMix().

In fact, we have nothing to do until next ppm happens. :-P

In other words, we just use the 'unused' time slot for painting.

Original comment by myfo...@gmail.com on 16 Nov 2012 at 8:30

GoogleCodeExporter commented 8 years ago
The duration_draw is only used for indicating what will happens if extreme long 
mix + extreme long redraw.

Just cannot send the most recent mix before next ppm.
It is well-known. ^_^

We cannot request fast enough redraw & ppm update while them exceed 18ms. 

I assumed 4ms is for the most cases, even hardly 1 takes 10ms will only delay 
next 1 ppm.
It doesn't hurt.

Again, redraw almost happens when PPM begin, there is no other better choice.

Original comment by myfo...@gmail.com on 16 Nov 2012 at 8:37

GoogleCodeExporter commented 8 years ago
Yes I fully agree. I want to finish the 2nd implementation (even if if will be 
trashed). I am sure that the last solution is the right one. We will see if the 
flash agrees...

Original comment by bson...@gmail.com on 16 Nov 2012 at 9:08

GoogleCodeExporter commented 8 years ago
So... 2nd implementation with an interrupt. Still not flashed...

Original comment by bson...@gmail.com on 17 Nov 2012 at 1:41

Attachments:

GoogleCodeExporter commented 8 years ago
Flashed, it seems not bad at all!

Original comment by bson...@gmail.com on 17 Nov 2012 at 3:15

GoogleCodeExporter commented 8 years ago
Had tested my idea.

Simple & Fast..

(Concept only! Not tested with DSO Or Logic Analyzer yet. More conditional 
definition for PCB & EMU needed.)

Original comment by myfo...@gmail.com on 17 Nov 2012 at 6:37

Attachments:

GoogleCodeExporter commented 8 years ago
be aware of:

1. if no ppm output, e.g., throttle protection,...

2. uint8_t and uint16_t overflow

Original comment by myfo...@gmail.com on 17 Nov 2012 at 7:10

GoogleCodeExporter commented 8 years ago
May I ask why to end just in time before setupPulses ?
Mix duration is not constant, depends on flight phases and active mixes, each 
time we will change duration making it longer on next PPM Pulse we will run out 
of time.
But maybe I have misunderstood the code.

Original comment by romolo.m...@gmail.com on 17 Nov 2012 at 8:06

GoogleCodeExporter commented 8 years ago
I calculate the duration of last mixing, then do mixing in advance.

e.g. 

PPM frame length = 18ms
Duration of LAST mixing = 5ms

Last PPM started at t0
Then, wait till t0+12ms  (that's, 18-5-1)

So, mixing is being scheduled to be excuted at t0+12, and completed at t0+12+5 
as expected. Then, "Duration of Last mixing" is re-calculated again.

Fortunately 1s. (t0+18) later, next PPM starts!

If not, e.g., you changed some switchs, the duration of mixing may be changed.
The only effect is we may not take the next bus on time.

But just only 1 time. Duration will be re-calculated and compensated, and be on 
time again.

Original comment by myfo...@gmail.com on 17 Nov 2012 at 9:19

GoogleCodeExporter commented 8 years ago
But why to lost one bus ???
Then we can start calculations asap , and do not repeat them until setup
pulses as been started once...
there are a lot of things that may change mixer calculations durations (ie
delays and slow down, physical switches, logical switches), that means a
lot of bus lost...
It's the "just in time" logic that I do not understand in this application.

Original comment by romolo.m...@gmail.com on 17 Nov 2012 at 9:46

GoogleCodeExporter commented 8 years ago
There is no difference.

The chance is 50% : 50%.

If we do extra mixing test, it is possible that waste time & miss the next ppm.
e.g. if the new mix is shorter then before.

For a 18ppm PPM, and a simple mixing + redrawing take about 9ms, only 9ms is 
'unused' time slot.

I don't think it is worth wasting flash & ram to do it.

Missing PPM mostly occurs when you are unluckily moving stick just before PPM 
setting up.

Comparing with it, we seldom change mixing setting during flight.

At least, the max latency is not changed, the possibility of max latency is not 
changed. That's why I do nothing about it.

Original comment by myfo...@gmail.com on 18 Nov 2012 at 6:41

GoogleCodeExporter commented 8 years ago
Another issue must be taken into consideration.
The mixing duration only could be estimated after it is executed.

When we change a switch, it likely the redrawing duration is much longer.
If we do the extra 1 mixing to get the new duration, both mixing + redrawing 
waste more time.

The chance of missing next PPM is higher.

Original comment by myfo...@gmail.com on 18 Nov 2012 at 6:50

GoogleCodeExporter commented 8 years ago
I see. I still didn't test, but it should work. How much flash is used?

The discussion is now about the latency introduced when the mixing duration 
increases. It will then also apply to the gruvin9x board.

A - "Normal" way
----------------
1) In a loop, we calculate the mixes, then we redraw the menus only if 10ms 
since last redraw have been expired

2) With simple mixes, we are regularly late (should be around 10%, when the 
menus are drawn).

3) With complex mixes (above 10ms), we are almost always late, we draw the 
menus each time we run the mixer

B - "Patched" way
-----------------
1) We run the mixer at T=NextPulsesTime-LastMixerDuration, then we draw the 
menus

2) The menus are drawn every 20ms instead of 10ms. Is it visible?

3) There is a 1ms margin, we are late as soon as a lot of mixes are introduced 
(due to a switch movement).

4) When mixes don't change we have an excellent stick to pulses latency

Original comment by bson...@gmail.com on 18 Nov 2012 at 8:22

GoogleCodeExporter commented 8 years ago
Yes, I perfer patched way. It should not increase any flash consumption. ( < 10 
bytes I guess). I'd tested the ~20ms redraw has no difference with 10ms.

The only issue is how much time in advance? 1ms or 2ms? or others?
Do you have some complex model mixing setup & timing that we could evaluate?

That's if we choose 2ms, then,

mix1 change to mix2
if ABS(duration_of_mix1 - duration_of_mix2) <= 2, then just be on time.

The only drawback is avg latency increases 1ms.

--

Another issue, I forgot to mention. If SD card is installed, the original way 
is much more slower. Since the implementation of writeLogs() is improper.

For such a low-end platform, we should avoid SPI i/o with SD card.
Suggested procedures as follows:

1. Open file

    // Pre-allocate FAT cluster for speedy write
    filesize = File1.fsize;

    if (filesize < 102400) {
        res = f_lseek(&File1, 102400);
        res = f_sync(&File1);

        f_lseek(&File1, 0);
        filesize = 102400;
    }

    // Seeking to first unused record
        ...

The point is, SD card is large enough (e.g. 8GB or etc).
We allocate a bigger file, e.g. 100K, and sync, this make FAT updated and no 
more access is necessary for next 100K bytes written.

Then, just do f_write!

each f_write only modify buffer in the RAM.

If buffer is full, it will be flushed into SD, only 512bytes write is involved.
Normally only 1 SPI transaction involved. Addressing time of SD is limited.

No addition FAT load / modification / write involved.

Another issue is too many xx_printf() used. 
printf() involved lots of conditional test & string manipulate, is very very 
slow.

Original comment by myfo...@gmail.com on 18 Nov 2012 at 9:43

GoogleCodeExporter commented 8 years ago
The latency in real use will always be 22ms...
Or are you able to sincronize your fingers with permain while using
switches ?
And btw are you really able to see a latency of 1/50 of seconds while
flying ?

Original comment by romolo.m...@gmail.com on 18 Nov 2012 at 10:11

GoogleCodeExporter commented 8 years ago
If there is no difference, there is no 9ms / 18ms spec. in servo.

In real, the latency may be over 50ms.

stick -> mix -> ppm -> tx module -> rx module -> servo frame rate

We can do nothing about some parts, others we can.

Original comment by myfo...@gmail.com on 18 Nov 2012 at 10:20

GoogleCodeExporter commented 8 years ago
Right. I have nothing against one more milli for the margin, 2ms is really a 
huge margin, which should fix all problems unless you use a slow transition 
between phases (and in this case, there is no latency problem as we want to 
decrease the speed of transitions!)

Would you provide to full patch, taking care of the overflow? What about 
removing the mixer duration inside the pulses_avr (as done for gruvin9x board)?

Original comment by bson...@gmail.com on 18 Nov 2012 at 12:33

GoogleCodeExporter commented 8 years ago
I've no full patch yet, since some parts (e.g. DSM2_PPM, DSM2_SERIAL, ...) I 
had not traced yet.

We shall make sure the patch is proper arranged for all protocols.

In addition, I have no idea how gruvin9x or arm based board works. ;-)

Original comment by myfo...@gmail.com on 18 Nov 2012 at 2:29

GoogleCodeExporter commented 8 years ago
I will take care of DSM2 and gruvin9x, don't worry, I would only like to see 
how you will take care of the overflow problem.

Original comment by bson...@gmail.com on 18 Nov 2012 at 2:33

GoogleCodeExporter commented 8 years ago
To prevent overflow, a common way is using 4 bytes counter.
Just don't let it overflow.

Alternately, if we know PPM is running, then the counter is updated about 20ms.
Don't care about overflow since 

  uint16_t tStarted = (getTmr16KHz() - tPPMSetup) / 8;      // unit: 0.5ms fox

the tStarted is always be positive and be the delta t, even if getTmr16KHz() < 
tPPMSetup (overflow occurs, e.g. tPPMSetup=0xFFFF, getTmr16KHz()=0x0001)

This is ONLY true if getTmr16KHz() not greater than tPPMSetup again. 
Luckily, if setupPulses ISR is running periodically, it always be ture. ^_^

How about if PPM is not running? just 

  if (tStarted < tPpmFrame-lastMixerDuration-2/*1ms*/ && g_bPPMisRunning)
    return;

That's, since PPM is stopped, always do update. 
Nothing is important than updating screen in this moment. ^_^

Original comment by myfo...@gmail.com on 18 Nov 2012 at 3:39

GoogleCodeExporter commented 8 years ago
Get confused? Let have an example...

int main(int argc, char* argv[])
{
    WORD a = 1;
    WORD b = 0xFFFF;
    WORD c = a - b;

    if (a > b) {
        printf("a>b\n");
    }

    if (a-b > 0) {
        printf("a-b>0\n");
    }

    if (c > 0) {
        printf("c>0 (c=%d)\n", c);
    }

    return 0;
}

Original comment by myfo...@gmail.com on 18 Nov 2012 at 3:58

GoogleCodeExporter commented 8 years ago
typedef unsigned short WORD;

Original comment by myfo...@gmail.com on 18 Nov 2012 at 4:00

GoogleCodeExporter commented 8 years ago
Of course the overflow is not there. I would prefer 
(tPpmFrame-lastMixerDuration-2) working as int8_t. 

Original comment by bson...@gmail.com on 18 Nov 2012 at 5:38

GoogleCodeExporter commented 8 years ago
Also tick10ms given to doMixerCalculations(tmr10ms, tick10ms) will be 
completely wrong!

Original comment by bson...@gmail.com on 18 Nov 2012 at 6:09

GoogleCodeExporter commented 8 years ago
I agree.

If for any case, (tPpmFrame-lastMixerDuration-2) < 0 or something wrong,
it cannot be too big. (max=127=63.5ms)

tStarted will be greater than it quickly, the perMain may not be blocked.

(seems that g_bPPMisRunning may be skipped.)

Original comment by myfo...@gmail.com on 18 Nov 2012 at 6:17

GoogleCodeExporter commented 8 years ago
tmr10ms, tick10ms should be moved from caller to callee

Original comment by myfo...@gmail.com on 18 Nov 2012 at 6:25

GoogleCodeExporter commented 8 years ago
Next version... Should be around 1.5ms between mixer end and setupPulses

Original comment by bson...@gmail.com on 18 Nov 2012 at 8:58

Attachments:

GoogleCodeExporter commented 8 years ago
Good job!! Much better than mine... Some macro were added, looks great!!

nextMixerTime is protected with cli() also.

Only one line I don't understand:

  uint16_t t0 = getTmr16KHz();
  if (t0 < nextMixerTime)
    return;

What happens if nextMixerTime near 0xFFFF?

Original comment by myfo...@gmail.com on 19 Nov 2012 at 12:35

GoogleCodeExporter commented 8 years ago
Right we will miss one train. Should be:

int16_t delta = t0 - nextMixerTime;
if (t0 < 0) return;

Original comment by bson...@gmail.com on 19 Nov 2012 at 6:22

GoogleCodeExporter commented 8 years ago
thumbs-up +1,000  ^_^

I think this should be the final optimized solution. (except a small typo: 
delta < 0)

Do you think it is also good for gruvin9x?

The LCD refresh rate is 50 fps in this patch, more than good. (even a movie 
only need 30 fps).

Original comment by myfo...@gmail.com on 19 Nov 2012 at 7:14

GoogleCodeExporter commented 8 years ago
It would be good for gruvin9x if we had an acceptable solution for SD!

Original comment by bson...@gmail.com on 19 Nov 2012 at 7:22

GoogleCodeExporter commented 8 years ago
I had modified the code according to your patch, and will verify the timing of 
ppm & doMix() by a DSO tomorrow. 

Since I have no gruvin9x board on hand, I can not test SD access timing for you.
If you have any previous test data for reference, maybe I can do something... 

Original comment by myfo...@gmail.com on 19 Nov 2012 at 9:29

GoogleCodeExporter commented 8 years ago
First candidate for commit ...

Original comment by bson...@gmail.com on 19 Nov 2012 at 9:47

Attachments:

GoogleCodeExporter commented 8 years ago
In rc1,

PROTO_PXX & DSM2_SERIAL seems has no chance to run SCHEDULE_MIXER()

Will doMixerCalculations() be enable?

Original comment by myfo...@gmail.com on 19 Nov 2012 at 10:07

GoogleCodeExporter commented 8 years ago
DSM2_SERIAL doesn't exist on stock! For PXX you are right.

Original comment by bson...@gmail.com on 19 Nov 2012 at 10:11