egui_plots: simplifying (complex) lines with Ramer–Douglas–Peucker might save a lot of triangulation/overdraw time

abey79 commented 7 months ago

A HN commenter suggested looking at the Ramer–Douglas–Peucker polyline simplification algorithm. For very big lines that generate lots of triangulation/overdraw load, that might be a very neat optimisation. We'd need to run it with e.g. 0.1px tolerance, and cache the result against the zoom level.

lokegustafsson commented 7 months ago

I found something like this necessary (chunking the time series in 1px-equivalents and replacing each chunk with its max value) for my hobby project linux task manager (https://github.com/lokegustafsson/pi/blob/master/crates/pi/src/system/time_series.rs). It seems like a very reasonable thing to have built into egui_plot.

willtemperley commented 7 months ago

I'm the HN commenter - FWIW this is the Swift code I'm using. I've found radial distance gives very similar results and seems faster.

https://gist.github.com/willtemperley/abd73fa097486bc1aac3a88d9d6d5458

mkalte666 commented 7 months ago

I think just simplifying to max values in the range might be inaccurate when looking at lines with high frequency contents; you already get significant aliasing with that.

I'm still a fan of using a user stored data format as plot input that "mipmaps" the data using an actual low pass filter

Instead of flat mapping you could do iterative addition and single pole filtering if you split the thing into multiple flat arrays

abey79 commented 7 months ago

@mkalte666 I agree than min/max aggregation works only for time series and is not a very good general-purpose approach. The RDP algorithm on the other-hand preserves high frequency content so I think it would work much better. Both approach could be combined with a mipmap that use RDP to produce lower-res version of the data.

mkalte666 commented 7 months ago

Depending on what you want to do, High frequency content needs to be filtered out instead of preserved, but now that i think about it, thst really depends on the use case - aliasing might be wanted or needs to be filtered out.

Hmm.

abey79 commented 7 months ago

I guess your proposed avenue of having some kind of Series trait to abstract over the actual data that is accepted by egui_plot would allow some user-controlled flexibility in how data is filtered/decimated before display.

lokegustafsson commented 6 months ago

In my use case the signal was already sufficiently well low-pass filtered that max-aggregration was sufficient. Having a Series trait seems useful, although most callsites probably could have egui_plot do low-pass-filtering internally.

enomado commented 5 months ago

journal.pone.0094694.pdf

NChechulin commented 5 months ago

Hi! Just tried plotting around 2.5M points on egui 0.27.2, and it is basically unusable. Is there anyone currently working on this algorithm? If no, I could try, but I have zero understanding of egui internals, and might need some guidance along the way.

NChechulin commented 5 months ago

Offtopic

Also, I have noticed that performance does not get any better as I zoom into the plot. A neat trick would be to infer the x_start and x_end --- the leftmost and rightmost points on the X axis. Then we can use binary search to find all the points such that x_start <= x <= x_end and draw only them.

If someone is working on that, please help me find an issue (i havent found one).

P.S. If you think this comment does not belong to the conversation, please, feel free to delete it.

abey79 commented 5 months ago

@NChechulin "brute force" render of 2.5M points gets you in "custom renderer" territory (been there, done that).

I'm not really sure egui is the right place for that amount of drawn data. The original issue was meant more as an (immediate-mode) optimisation when data is ~10-100x what can possibly visualised with the available pixels. For 1000x+, you'll probably need to do probably-cached aggregation type of things. I would certainly be great to have some support for that in egui plot, but that sounds complicated and somewhat at odds with the IM paradigm. FWIW, we are in that latter case at Rerun, and we currently use at least the zoom level, probably the min max boundaries as well (can't remember) to drive the data query and aggregation.

As you say, though, there are likely a myriad of low hanging fruits to make things at least a bit better.

mkalte666 commented 5 months ago

Hi! Just tried plotting around 2.5M points on egui 0.27.2, and it is basically unusable. Is there anyone currently working on this algorithm? If no, I could try, but I have zero understanding of egui internals, and might need some guidance along the way.

I have done > 10 million points in a static plot and its doable.. but not without work. I mean, if you wrote your own gpu based renderer (and thats what @abey79 basically is suggesting, its probably very much doable. but you want this in egui.

I cannot share the whole code (again, work project), but i have shared the core of how i solve this problem: I mip-map the data beforehand.

The basic idea is to look at the current plot bounds, and provide a chunk of data that is just a bit more (or less, depending on your taste) than those bounds, and then select an appropriate filter level and chunk from your pre-mip-mapped data. This seems weirdly much at first, but in my experience is incredibly fast. (gpus do that for texture filtering, and while on cpu, this is just 1d flat data that is easily addressed into :D).

You will face challanges such as initially filling the plot bounds, actually generating your mipmap using a proper filter function (see this issue for one choice), etc etc. But it works without much custom rendering. I think the whole codebase around that is less than 200 lines, and its not clever code.

I will try to clear with my employer if i can publish this open source, ping me in a month if i forget.

NChechulin commented 4 months ago

Hi everyone! I made a crate with 1-dimensional mipmapper inspired by @mkalte666's code. It is hosted on github and is available on crates.io as well.

The key differences are:

Now it is generic
It uses Vec<Vec<_>> for internal data storage. It shouldn't introduce much memory or speed overhead.
It does not discard data
We don't get annoying 0's in the end of each mipmap
It currently aggregates 2 points by averaging their value. New ways are to be added soon.

It is still called mipmap, because I did not come up with a better name.

P.S. I might forget to change this comment after I release a new version, so if you're interested, check the crates readme, some things might've changed.

enomado commented 1 month ago

I still think all you need is journal.pone.0094694.pdf

Ramer–Douglas–Peucker is very expensive algorithm. it is good for random cartography geometry.

if you have sorted time series data you can just downsample chunks each frame, even without caching. its ok even with 1-to-100 point.

I even tesselate lines in my own - by adding min-max trapezoids to mesh

emilk / egui_plot

egui_plots: simplifying (complex) lines with Ramer–Douglas–Peucker might save a lot of triangulation/overdraw time #22