JuliaPlots / StatsPlots.jl

Statistical plotting recipes for Plots.jl
Other
437 stars 89 forks source link

marginalkde ~100x slower than a scatter plot #504

Open BioTurboNick opened 2 years ago

BioTurboNick commented 2 years ago

I was hoping to use something like marginalkde to better display dense scatterplot data, but it takes 100x as long as a scatter plot to generate.

The slow part is entirely the pdf call for each x/y.

Barring performance improvements to pdf, is there a way to reduce the resolution so it can be calculated faster?

BioTurboNick commented 2 years ago
x = rand(1234)
y = x + rand(1234)
@btime scatter(x, y)
  245.500 μs (1237 allocations: 90.60 KiB)

image

Looking at npoints: 32x32 (226.594 ms (201146 allocations: 57.29 MiB)) image 64x64 (374.709 ms (204848 allocations: 158.35 MiB)) image 128x128 (828.861 ms (204848 allocations: 534.07 MiB)) image 256x256 (2.513 s (204848 allocations: 1.93 GiB)) - the default image

I think 64x64 seems to strike a good balance of speed and performance.

This could be exposed as a parameter indicating the exponent of the power of two to use: default could be 6, 7 or 8 (current default) would be reasonable for higher quality

BioTurboNick commented 2 years ago

Then again, even 128 in real world data has some issues - but it occurs to me this might be due to far outliers in the data leading to lower resolution in the denser parts. image

BioTurboNick commented 2 years ago

Trimming the upper and lower 1% of points helped. Maybe that can also be a parameter?

BioTurboNick commented 2 years ago

Waaait a second - the pdf function is only being used to select the levels. But the contour function can already do that internally. Is there any benefit to this? Quite expensive for just that task.

Contour alone: image

Current implementation: image

BioTurboNick commented 2 years ago

"levels are evenly-spaced in the cumulative probability mass" is what the documentation says. Maybe that's importantly different from what GR does internally. Not sure what the pros and cons would be.