DimonSE / open9x

Automatically exported from code.google.com/p/open9x
0 stars 0 forks source link

perMain is slow #157

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Which board (stock / gruvin9x / sky9x) are you using?
stock
What is your open9x FW version?
R1563
What is your open9x EEPROM version?

What steps will reproduce the problem?

When I switched from ER9x to Open9x, I feel the servos' response is not as fast 
as before.

So, I used a DSO to monitor PPM output, and use the GEAR switch as a triger 
input to capture the PPM wave to find the delay between GEAR and PPM.

Comparing with ER9x, it does be slower.

Then, I try to use STAT (long pressing DOWN key) to check the g_timeMainMax.

It is ~5ms v.s. ~10ms between Er9x & Open9x.

Then I tried to find out why the perMain() loop is slower.

I add the following code to count delay time each time, i.e.

// 0~12ms counters 
uint16_t g_timeList[12]={0,0,0,0,0,0,0,0,0,0,0,0};

...

    t0 = getTmr16KHz() - t0;
    if (t0 > g_timeMainMax) g_timeMainMax = t0;

{
    uint16_t t = t0/16; // unit convert to ms

    // max idx=11
    if (t >= 11)  // if over 11ms, count them as [11]
        t=11;

    g_timeList[t]++;
}

In statistics_views.cpp, I dump all counts. i.e.

  //lcd_putsLeft(5*FH, STR_FREESTACKMINB);
  //lcd_outdezAtt(14*FW, 5*FH, stack_free(), UNSIGN) ;
  {
      int a;
      for (a=0; a <12; a++) {
          uint16_t row = a/4;
          uint16_t col = a%4;
        lcd_outdezAtt(FW*5*(col+1), (5+row)*FH, g_timeList[a], PREC2);
      }
  }

I found that 90% loops takes about 10ms, only 10% about 3ms.
(max ~10ms)in Open9x

Comparing with ER9x (last CVN checkout) , about 66% within 3ms,
33% within 6ms. (max ~5.5ms)

The model setting is "Simple 4CH template".

I did prefer the overall feeling & code style of Open9x.

If the issue could be solved, Open9x will be perfect for me.

Thank you!

Original issue reported on code.google.com by myfo...@gmail.com on 10 Nov 2012 at 8:12

GoogleCodeExporter commented 9 years ago
It should be 10% around 10ms and 90% around 3ms

Original comment by bson...@gmail.com on 10 Nov 2012 at 8:19

GoogleCodeExporter commented 9 years ago
What are your compile options?

Original comment by bson...@gmail.com on 10 Nov 2012 at 8:20

GoogleCodeExporter commented 9 years ago
Also would you do it on an empty model? I will check on the HW simulator here

Original comment by bson...@gmail.com on 10 Nov 2012 at 8:35

GoogleCodeExporter commented 9 years ago
I propose that we try with only make, without any option, unless you use 
telemetry?

Original comment by bson...@gmail.com on 10 Nov 2012 at 8:37

GoogleCodeExporter commented 9 years ago
Did you comment out the stack probe? The stack is greater on open9x than on 
er9x!

Original comment by bson...@gmail.com on 10 Nov 2012 at 8:55

GoogleCodeExporter commented 9 years ago
Yes, I just svn checkout and 'make' only.
Makefile is not modified. (STD config)

I comment out Stack indication for extra 3 lines for display 
[count 0~1ms] [count 1~2ms] [count 2~3ms] [count 3~4ms]
[count 4~5ms] ...
[count 8~9ms] [count 9~10ms] [count 10~11ms] [count > 11ms]

The stack free in Open9x is bigger (about 830bytes) comparing with ER9x (220 
bytes).

After I installed the new compiled firmware, the eeprom is wrong and must be 
re-formated.

So, the data should be empty. Then, I use template to new a 'Simple 4CH' model.

The procedure is the same for ER9x.

I guess (just wonder), maybe the code behind
  if (!tick10ms) return; //make sure the rest happen only every 10ms.
takes too much time, so it must be executed almost every time.

e.g. if it takes 7ms + 3ms doMixerCalculations() then all procedures must be run
every time. Since the criteria is 10ms, the condition is almost ture every time.

Original comment by myfo...@gmail.com on 10 Nov 2012 at 9:34

GoogleCodeExporter commented 9 years ago
Before I start the test, I download only binary (ie. 
open9x-stock-heli-templates-en.hex).

The data captured by DSO is based on open9x-stock-heli-templates-en.hex

Original comment by myfo...@gmail.com on 10 Nov 2012 at 9:38

GoogleCodeExporter commented 9 years ago
Forgot to mention, the following code is added to clear counter too.

    case EVT_KEY_FIRST(KEY_MENU):
      ...
      g_timeMainMax    = 0;

      {
          int a;
          for (a=0; a<12; a++)
              g_timeList[a]=0;
      }

Original comment by myfo...@gmail.com on 10 Nov 2012 at 9:43

GoogleCodeExporter commented 9 years ago
Would you try to comment this line:
(in stats view)
lcd_status_line();

Original comment by bson...@gmail.com on 11 Nov 2012 at 10:01

GoogleCodeExporter commented 9 years ago
After commenting out lcd_status_line();

It seems to be better.

about 60% within 2~3ms
about 40% within 7~8ms

Max = 7.31ms

Ok! The latency shown on statistics is made by lcd redraw.

Mmmm.... Normally, we should NOT stay within statistics view when flying a 
plane.

So, I did a real test to simulate a real fight procedure as follows:

1. Turn on radio

2. Goto statistics view and clear counter, then back to main display immediately

3. Push throttle and let timer run

4. After 1 minute, I press LONG DOWN to go to statictics

5. Take a snap reading immediately.

The reading shows the accumulated counters between 0~11ms of previous 1 minute.

The result is...

70% within 2~3ms
30% within 6~7ms

It seems not too bad! ^_^

So, I suppose the latency is not an issue, and could be closed.
Thanks a lot!!

Original comment by myfo...@gmail.com on 11 Nov 2012 at 1:33

GoogleCodeExporter commented 9 years ago
I will close the issue once lcd_status_line() has been made quicker!

Original comment by bson...@gmail.com on 11 Nov 2012 at 2:48

GoogleCodeExporter commented 9 years ago
My thought is... Since LCD access is slow, is it necessary to redraw every 10ms 
?

If we could divide other sustaining subroutines to be distributed, then the max 
tmain could be the smallest. (~2ms)

Comparing with max 7~8ms, the improvement is noticeable.

e.g.

...
doMixerCalculations(tmr10ms, tick10ms);
...

if (!tick10ms) return;

...

// Divide all others into 10 parts, and do it one by one per 10ms. (only for 
example)
static uint16_t uStage = 0;

switch (++uStage % 10) {
  case 0: drawStatusLine();
  break;

  case 1: refreshDisplay();
  break;

  case 2: checkOOO();
  break;

  case 3: checkXXX();
  break;

  ...
}

Original comment by myfo...@gmail.com on 12 Nov 2012 at 7:35

GoogleCodeExporter commented 9 years ago
We need a 10ms refresh. But your idea is something possible. But perhaps it 
will cost less flash to call doMixerCalculations just before drawStatusLine 
(and change the g_timeMainMax calculation code of course).

Did you measeure how much does refreshDisplay cost? Less than 1ms?

Original comment by bson...@gmail.com on 13 Nov 2012 at 9:26

GoogleCodeExporter commented 9 years ago
I will try to do some test with DSO..

e.g. triggle a unused GPIO before & after a function call

Then, we can get the very accurate timing of a function.

(to be continued...)

Original comment by myfo...@gmail.com on 13 Nov 2012 at 9:54

GoogleCodeExporter commented 9 years ago
>But perhaps it will cost less flash to call doMixerCalculations just before 
drawStatusLine (and change the g_timeMainMax calculation code of course).

I think currently g_timeMainMax calculation is correct.
It represents the max latency between two doMixerCalculations() called.

The calling sequence is something like:

doMixerCalculations() -> doMixerCalculations() -> 
drawStatusLine()/refreshDisplay()... -> doMixerCalculations() -> 
doMixerCalculations() -> drawStatusLine()/refreshDisplay() -> ...

(assume 66% : 33%)

In my previous test, doMixerCalculations() takes about only 2~3ms,
but if code below "if (!tick10ms) return;" is excuted, the latency grows up to 
7ms.

That's, when this part involved, next doMixerCalculations() will be 7ms later.
In the mean time, if the gimbal/switch is changed, the output stii not be 
updated.

I proposed in comment 12 is - not run the code below 'if (!tick10ms)' in the 
same time. Slice them into serveral parts, then each part will not take such 
long time.

Original comment by myfo...@gmail.com on 13 Nov 2012 at 2:26

GoogleCodeExporter commented 9 years ago
I am afraid that the time is 99% the menu which is redrawn. That's why I asked 
to measure the refreshDisplay function!

Original comment by bson...@gmail.com on 13 Nov 2012 at 2:33

GoogleCodeExporter commented 9 years ago
I see. ^_^

In my experience, LCD redrawing  mainly contains 2 parts.

(1) Paint on memory buffer 

(2) Move from memory buffer to LCD buffer via h/w interface. Normally be slow.
(parallel interface 8 bits in this case, maybe not too slow)

It is better to measure how long does each one take.

In your comment 13, you said "we need a 10ms refresh."
What's the reason?

I suppose update LCD every 100ms is good enough.

(Forgive me, I didn't survey all the code completely...)

Original comment by myfo...@gmail.com on 13 Nov 2012 at 3:08

GoogleCodeExporter commented 9 years ago
Since I'm at home and no DSO in hand, I did some f/w modification to test some 
cases...

(1) Draw one line per 10ms. e.g.

  if (!tick10ms) return;
  ...
  //refreshDisplay();
  refreshDisplayY(tmr10ms & 0x07);  // Trick, just draw one line between 0..7

No speed up. 
It seems that LCD drawing is not the bottle neck, even I update 1 line per 10ms.

(2) Call functions other than doMixerCalculations every 50ms, e.g.,

void perMain()
{
  static tmr10ms_t lastTMR;
  tmr10ms_t tmr10ms = get_tmr10ms()/5;  // => every 50ms

This does help!! Over 90% within 2~3ms.
But as you know, the servo status shown on LCD is not smooth.

Now I know why you said "we need 10ms refresh"  @_@"

Using a DSO to measure all major functions' timing is necessary before we can
optimize the latency.

BTW, thanks for your effort on this issue & flash saving.

I had knew the flash saving is a difficult and time consuming work.
I hate it on my job! 

Original comment by myfo...@gmail.com on 13 Nov 2012 at 6:49

GoogleCodeExporter commented 9 years ago
After tracing some code, I'd found that latency comes from unnecessary painting 
on screen buffer.

With the following modification, we can get total latency about 97% within 2~3ms
2% within 1~2ms, 1% within 6~7ms. 

Comparing with the result reported first, it improved significantly!!

void perMain()
{
...
//  lcd_clear();    // FOX comment out
...
}

void menuMainView(uint8_t event)
{
...
  switch(event)
  {
    ...
  }

  // FOX added begin...
  static uint8_t s_u8Count = 0;

  if ((++s_u8Count & 0x0F) && event==0 && _event==0)
      return;

  lcd_clear();
  // FOX added end...

  ...
}

The idea is - if there is no incoming new event, there is no need for updating 
screen. Only 160ms redraw interval is enough.

I had tested the servo monitor on main_view and thought the response is 
acceptable.

BTW, since lcd_clear() is removed in perMain(), other view/menu should make some
similar modification too.

Original comment by myfo...@gmail.com on 14 Nov 2012 at 4:51

GoogleCodeExporter commented 9 years ago
That's definitely an interesting idea.

Original comment by bson...@gmail.com on 14 Nov 2012 at 5:12

GoogleCodeExporter commented 9 years ago
May I ask you to provide me the full patch, I think you are reaching the 
solution.

A couple of random thoughts:

1) Why not doing it inside perMain? It could even reduce the latency between a 
key press and the event taken into account in the menus!

2) Be careful to BLINK chars, to the TIMERS display, to the keys repeat usage, 
but I think you got it

3) It is only needed on PCBSTD board. At least for now...

I let you work on a whole patch that I will review and let you commit directly 
as soon as validated by some other developers, Ok?

Thanks for the idea!

Original comment by bson...@gmail.com on 14 Nov 2012 at 6:05

GoogleCodeExporter commented 9 years ago
In fact, I didn't do full patch on the source.
We hardly stay within views other than mainView.

1) You got it! Maybe we could check the event within perMain, but I don't trace
all code. I don't know if some global warning/variables will be affected or not.

Current implementation mixes (input)/(condition test)/(redraw) within 1 
callback.
I can't split them easily. 

If we may divide each to independent callback function pointer, than the upper 
main loop could drive correct functions on precise timing when necessary.

The speed will be increased again. 

Anyway, there must be some historical reason & reused code.
Less modification is less bugs. ^_^

2) Blinking & timers look nice (in my opinion). 0.16s updating interval seems 
to be good enough. Other UI behavior shall generates events, and no problem.

3) Yes. At64 is not so fast. ;-)

Since I just look into the code for 2 days, I'm afraid be a trouble maker.
I'll take some real fights test tomorrow.

Original comment by myfo...@gmail.com on 14 Nov 2012 at 7:06

GoogleCodeExporter commented 9 years ago
Please give a quick try inside perMain. I have in mind that it will be a very 
good change. 

And don't be too afraid, your patch will be reviewed and discussed by several 
people before being commited into the trunk! 

Original comment by bson...@gmail.com on 14 Nov 2012 at 7:26

GoogleCodeExporter commented 9 years ago
According to your great suggestion, I simplified the modification.

Only changing perMain() is enough. See below...

Hope it increases only several flash bytes ^_^

  ...
  const char *warn = s_warning;

  // Modified by FOX 2012/11/15 begin
#if 0
  g_menuStack[g_menuStackPtr](warn ? 0 : evt);
#else
  // Skip redraw when no event coming. Redraw every 160ms anyway...
  if (warn) {
      // Original code sequence, always update if warning
      lcd_clear();
      g_menuStack[g_menuStackPtr](0);
  }
  else if (evt != 0 || (tmr10ms & 0x0F)==0 ) {
      // Update screen buffer only if new events, or every 160ms
      lcd_clear();
      g_menuStack[g_menuStackPtr](evt);
  }
#endif
  // Modified by FOX 2012/11/15 end

  if (warn) displayWarning(evt);
  drawStatusLine();
  ...

Original comment by myfo...@gmail.com on 14 Nov 2012 at 7:39

GoogleCodeExporter commented 9 years ago
Also comment out previous lcd_clear() is necessary, otherwise ...

...
//  lcd_clear();    //FOX

#if defined(PCBSKY9X) && defined(SIMU)
  checkTrims();
#endif

....

Original comment by myfo...@gmail.com on 14 Nov 2012 at 7:41

GoogleCodeExporter commented 9 years ago
Take your time, you will attach an svn diff to the issue, it makes reviews 
easier!

Original comment by bson...@gmail.com on 14 Nov 2012 at 7:44

GoogleCodeExporter commented 9 years ago
I had made several test flight today.
It seems to be better.

Only an minor defect that when I switch between different views,
sometimes I can see the fading effect.

I take a look at the code, and find a question within all views.

e.g.

    case EVT_KEY_STATISTICS:
      chainMenu(menuStatisticsView);
      killEvents(event);
      break;
    ...
    }

    // still draw main view here.

When I check the chainMenu(), I find it calls 

   (*newMenu)(EVT_ENTRY);

That becomes, (1)drawing statistics view, return (2) drawing main view (3) 
refreshDisplay()

It seems that after calling chainMenu() shall return; instead of break;

Original comment by myfo...@gmail.com on 15 Nov 2012 at 6:28

GoogleCodeExporter commented 9 years ago
Yes, I have modified a lot of these historical bugs, but for sure some remain, 
I didn't systematically check!

Please include this fix inside your patch, thanks!

Original comment by bson...@gmail.com on 15 Nov 2012 at 6:48

GoogleCodeExporter commented 9 years ago
How can I commit my patch?

I'll do more test, and commit later.

Original comment by myfo...@gmail.com on 15 Nov 2012 at 8:09

GoogleCodeExporter commented 9 years ago
We should review it carefully before commiting it to trunk! Would you attach 
the patch file to the issue?

cd src
svn diff . > fox.patch

Thanks!

Original comment by bson...@gmail.com on 15 Nov 2012 at 8:52

GoogleCodeExporter commented 9 years ago
Here it is...

If you are curious about the latency distribution, you may enable 
"TEST_LATENCY = YES" in Makefile, and read it within statistic view.

I sample 10000 times and stop, so you may operate your tx as usual for a while,
then go to statistic view to read the counters.

(For saving flash & data, it may be disabled by default.)

Original comment by myfo...@gmail.com on 15 Nov 2012 at 11:51

Attachments:

GoogleCodeExporter commented 9 years ago
It's indeed very good! Let-me attached a simplified (without the TEST_LATENCY) 
patch  with some improvements and some more #ifdef and then we will ask for 
more code reviews.

Original comment by bson...@gmail.com on 15 Nov 2012 at 1:25

GoogleCodeExporter commented 9 years ago
Did I forget anything? Still not flashed on my Tx...

Original comment by bson...@gmail.com on 15 Nov 2012 at 6:45

Attachments:

GoogleCodeExporter commented 9 years ago
Flashed on my Tx. Not bad, but the bar graphs are slower now. What do you think?

Original comment by bson...@gmail.com on 15 Nov 2012 at 9:39

GoogleCodeExporter commented 9 years ago
What about having a parameter from 1 to 16 in general settings to slow down
refresh ?

Original comment by romolo.m...@gmail.com on 15 Nov 2012 at 9:45

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Romolo you let me an idea of dynamic refresh!
longest refresh when usually flying - on main screen and telemetry view.
shortest refresh on setup - radio setup and model setup screens.

Original comment by gbir...@gmail.com on 15 Nov 2012 at 10:37

GoogleCodeExporter commented 9 years ago
I knew servo bar is slower, but in my opinion the servo bar is used for initial 
setup for maiden fight.

Is it worth for some more code (flash) just for human reading? 
The response of servo is my first priority.

If it is, then a global variable e.g. extern uint8_t g_bRefreshRate = 16 could 
be set.

When we need a fast refresh, e.g. within a special view, we can speed up via 
g_bRefreshRate = 2, and reset it when exiting the menu.

Of course, since having no common destructor for each menu class, we must 
remember to do it manually.

Original comment by myfo...@gmail.com on 15 Nov 2012 at 10:47

GoogleCodeExporter commented 9 years ago
To saving flash, we may just put g_bRefreshRate = 16; within the main_view() 
(or etc.)

put g_bRefreshRate = 1; within chainMenu()

That's, default refresh rate is 1 per 10ms, and 160ms within main_view

Only extra 4 bytes needed.

Original comment by myfo...@gmail.com on 15 Nov 2012 at 10:56

GoogleCodeExporter commented 9 years ago
Telemetry bars are slow too...

Original comment by romolo.m...@gmail.com on 15 Nov 2012 at 11:39

GoogleCodeExporter commented 9 years ago
Here you are... 

Currently, I use 10ms for all other views except main_view.

In main_view, only e_outputBars enjoys its 10ms refresh. ^_^

You may enable the TEST_LATENCY to check the impressed difference.

(To be honest, I don't think this is better, it makes confusion for end-user.
The TX has different latency if he stays within e_outputBars)

Original comment by myfo...@gmail.com on 15 Nov 2012 at 11:41

Attachments:

GoogleCodeExporter commented 9 years ago
Just enjoyed my frights and back here... :-D

For telemetry, I will setup my FrSky module later and check about it...

Anything, my first image is 0.16 refresh rate for telemetry reading should not 
be too slow.

We use tx for flying models not for watching mpeg, right? ^_^

Original comment by myfo...@gmail.com on 16 Nov 2012 at 2:09

GoogleCodeExporter commented 9 years ago
Another hint about latency...

If you're using PPM module like FrSky or etc, the latency is much more 
important.

e.g. Assume you set PPM 18ms

If latency is 2ms, then the max latency arrived the module should be 20ms.
But, about 11% (i.e. 2/18) chance that you wait extra 18ms.  

If latency is 9ms, then the max latency arrived the module should be 27ms.
But, about 50% (i.e. 9/18) chance that you wait extra 18ms. 

The reason is, if you miss this bus, you must wait the next one come 18ms later.

Original comment by myfo...@gmail.com on 16 Nov 2012 at 5:32

GoogleCodeExporter commented 9 years ago
I had another idea during the night: when we execute perMain, we know how far 
is the next setupPulses! Do you get it?

Original comment by bson...@gmail.com on 16 Nov 2012 at 6:14

GoogleCodeExporter commented 9 years ago
Excellent!!

if (tTimeToNextPulse > tMaxMain) {
  Redraw();
  RefreshScren();
}

Original comment by myfo...@gmail.com on 16 Nov 2012 at 8:25

GoogleCodeExporter commented 9 years ago
For 18ms PPM, it ensure that every 18ms has at least one redraw(). 

Comparing with original 10ms, almost be the same.

Original comment by myfo...@gmail.com on 16 Nov 2012 at 8:35

GoogleCodeExporter commented 9 years ago
Must take care of every short of PPM setting.
e.g PPM length = 10ms and etc.

Also, PXX / DSM2 and etc.

Original comment by myfo...@gmail.com on 16 Nov 2012 at 8:41

GoogleCodeExporter commented 9 years ago
As done on gruvin9x board

Original comment by bson...@gmail.com on 16 Nov 2012 at 8:45

GoogleCodeExporter commented 9 years ago
I am even thinking right now to use TIMER2, I just discovered it's not used at 
all! The code is ready for that, exactly my gruvin9x implementation!

Original comment by bson...@gmail.com on 16 Nov 2012 at 9:08

GoogleCodeExporter commented 9 years ago
ARM v.s. avr, Speed & stack 
Had better to keep timer ISR as short as possible

Original comment by myfo...@gmail.com on 16 Nov 2012 at 10:09