marginalkde ~100x slower than a scatter plot

JuliaPlots / StatsPlots.jl

Statistical plotting recipes for Plots.jl

Other

437 stars 89 forks source link

marginalkde ~100x slower than a scatter plot #504

Open BioTurboNick opened 2 years ago

BioTurboNick commented 2 years ago

I was hoping to use something like marginalkde to better display dense scatterplot data, but it takes 100x as long as a scatter plot to generate.

The slow part is entirely the pdf call for each x/y.

Barring performance improvements to pdf, is there a way to reduce the resolution so it can be calculated faster?

BioTurboNick commented 2 years ago

x = rand(1234)
y = x + rand(1234)
@btime scatter(x, y)
  245.500 μs (1237 allocations: 90.60 KiB)

Looking at npoints: 32x32 (226.594 ms (201146 allocations: 57.29 MiB)) 64x64 (374.709 ms (204848 allocations: 158.35 MiB)) 128x128 (828.861 ms (204848 allocations: 534.07 MiB)) 256x256 (2.513 s (204848 allocations: 1.93 GiB)) - the default

I think 64x64 seems to strike a good balance of speed and performance.

This could be exposed as a parameter indicating the exponent of the power of two to use: default could be 6, 7 or 8 (current default) would be reasonable for higher quality

BioTurboNick commented 2 years ago

Then again, even 128 in real world data has some issues - but it occurs to me this might be due to far outliers in the data leading to lower resolution in the denser parts.

BioTurboNick commented 2 years ago

Trimming the upper and lower 1% of points helped. Maybe that can also be a parameter?

BioTurboNick commented 2 years ago

Waaait a second - the pdf function is only being used to select the levels. But the contour function can already do that internally. Is there any benefit to this? Quite expensive for just that task.

Contour alone:

Current implementation:

BioTurboNick commented 2 years ago

"levels are evenly-spaced in the cumulative probability mass" is what the documentation says. Maybe that's importantly different from what GR does internally. Not sure what the pros and cons would be.