Degraded XCSoar performance on high zoom levels

kedder commented 4 years ago

This issue is originally reported by @bomilkar on #71, but taken out as a separate one here.

When using XCSoar on a very high zoom levels (500km across the map), XCSoar may perform very poorly: does not react to control inputs, or react with significant delay, refresh frame rate will also drop (to 1 frame per several seconds). In some extreme cases reboot is required to restore the control.

What is known so far about the issue:

It seems to be triggerred by a very large airspace file. When airspace rendering is disabled in xcsoar, performance regression on high zoom levels is not apparent. Smaller airspace file (covering smaller areas, or having less objects) also produces less impact on performance
The issue manifests itself only when XCSoar receives NMEA input and constantly has to update the screen.
CPU usage goes to 100%
XCSoar will recover from this condition if user manages to zoom the map in or remove NMEA input (e.g. by physically disconnecting the attached devices)

Reproducing

Reproducing the issue is fairly easy:

Download the sample xcsoar datadir. It contains the big map of central europe and airspace file covering most of central europe as well.
Run XCSoar and start replay of 2020-05-18_09-30.nmea file
Zoom out the map to maximum 500km range

General slowdown, CPU usage growth and input lag will be apparent.

Workarounds

The best possible workaround so far is to use smaller airspace file.

Possible solutions

Patch XCSoar to limit allowed zoom level to lower values (e.g. 250km). Not ideal, because that would affect users who are not experiencing the issue. OTOH, easy to implement

bomilkar commented 4 years ago

I did use a smaller airspace file, too. It didn't make a difference. Hence the size of the airspace file is probably not the (only) issue. Airspace_de_2020-05-26.txt

bomilkar commented 4 years ago

I reviewed "journalctl --system" to see if there is any message during the "500km-zoom-trap". Nothing there.

CptFrikadel commented 4 years ago

Is there a way to send NMEA sentences to xcsoar from the ov? So basically a sensord mock up?

bomilkar commented 4 years ago

Do you want to run it on the Cubie? Or on your Linux workstation?

CptFrikadel commented 4 years ago

On the cubie preferrably

bomilkar commented 4 years ago

Does the NMEA log file have GPRMZ sentences with real fixes?

CptFrikadel commented 4 years ago

Yes, they are the log files produced by xcsoar

bomilkar commented 4 years ago

Try this: https://www.dropbox.com/s/bvwt3giufj01rqj/sensord_mock?dl=0 sensord_mock -i nmea.log sensord_mock --help

This is nothing official!

bomilkar commented 4 years ago

make sure to stop and/or disable sensord. (systemctrl stop sensord) sensord_mock sends on the same port 4352.

kedder commented 4 years ago

The easiest way is to use XCSoar to replay pre-recorded NMEA log file. One can also forward a port with ssh, like this:

ssh -L 4353:localhost:4353 <your-openvario-ip>

This will open a listening port 4353 on localhost (linux workstation). Everything sent to that port will be forwarded to XCSoar that runs on openvario.

CptFrikadel commented 4 years ago

Thanks! I'll report when I find something..

CptFrikadel commented 4 years ago

This may be related to XCSoar/XCSoar#379 which has since been fixed. A regression, that I unfortunately introduced trying to fix something else (I'm really sorry), was causing excessive redraws.

Can somebody verify that this does fix the issue on OV as well? I won't have access to my OV until at least next week.

bomilkar commented 4 years ago

Can somebody verify that this does fix the issue on OV as well? I won't have access to my OV until at least next week.

I can build a new OV image. But how do I know the issue has bin fixed? OV's build process is quite convoluted (in my humble opinion) so I don't know which commits are included. How do I tell (from the source code) that it picked up the fixes?

CptFrikadel commented 4 years ago

The xcsoar-testing recipe pulls from the master branch, so it will automatically include the latest commit.

Here is a package/executable should you not want to build an entire new image:

bomilkar commented 4 years ago

I did a repo sync before bitbake. So I it should have picked up the master branch in the current state. But the 500km zoom level is still "sticky". "Sticky" meaning: once it zooms out to that level, there is no way back unless I stop the GPS source. (I have a simple patch for this to limit the zoom lever to 250km.) However, it feels less sluggish. But that's hard to quantify and I might be wrong. I didn't see any "lima 1c40000.gpu: gp soft reset time out" messages in the journal for over 3 hours of uptime. That's good, but may have a different reason.

CptFrikadel commented 4 years ago

"I did a repo sync before bitbake"

FYI, bitbake does a fetch operation during the build process where it pulls the sources from the URI specified in the recipe. So a repo sync shouldn't have been necessary. You can also build individual recipes using bitbake so that you don't have to build the entire image every time (provided dependencies are still correct).

But the 500km zoom level is still "sticky".

So it might not all be my fault.. :smile:

I didn't see any "lima 1c40000.gpu: gp soft reset time out" messages in the journal for over 3 hours of uptime.

I have been using stable xcsoar version on my OV the past few weeks, and did not really feel a noticeable difference in sluggishness with the testing version (that still had the bug), but, like you said, that is hard to quantify. I also did not see any lima erorrs in the journal. Does getting the map "stuck" also not generate a lima error?

bomilkar commented 4 years ago

Does getting the map "stuck" also not generate a lima error?

No lima errors whatsoever in 16 hours uptime. Even with the map stuck. Entirely unrelated.

"Map stuck" depends on how busy the map is. For my replay I use a couple of nmea logs from flights from Unterwössen together with ALPS_HighRes.xcm and the complete airspace structure and waypoints. Zooming out to 500km on a course where the xcm file doesn't fill the entire map area may not get it "stuck". (It doesn't make a difference if terrain is on or off.) When it is "stuck" keystrokes (up arrow) are ignored and forgotten. The map display is still moving. CUP percentage peaks at ~80% and then drops to ~50%.

What I also find remarkable: there are times when lima errors occur several times per second until I stop XCSoar. And then again no lima errors for hours and days with the identical sd card. Very strange!

linuxianer99 commented 2 years ago

@kedder : Can we close this issue ??

kedder commented 2 years ago

I don't think it is fixed, but noone is actually working on it AFAIK. The only suggested workaround we've found so far is "make your airspace file smaller". So, dunno, we can declare this issue as "expired" if it is annoying to have a stale issue on a list.

MaxKellermann commented 2 years ago

This is not an OpenVario issue, and this should have been reported to XCSoar. If a large airspace file causes performance problems, these should be optimized in XCSoar. There's nothing OpenVario can do.

linuxianer99 commented 2 years ago

Not Openvario relevant ...

kedder commented 2 years ago

It might be OV related or not. I think it is reported as a regression after we switched to mainline kernel and graphics stack (with OSS "lima" drivers). On the old linux-3.4 with proprietary mali driver there was no performance issue. OTOH, the xcsoar was much much older on old image as well (don't remember which one, but I think that was early 6.x or even 5.x). So not very clear where the regression is coming from.

Openvario / meta-openvario