Open BioTurboNick opened 2 years ago
x = rand(1234)
y = x + rand(1234)
@btime scatter(x, y)
245.500 μs (1237 allocations: 90.60 KiB)
Looking at npoints: 32x32 (226.594 ms (201146 allocations: 57.29 MiB)) 64x64 (374.709 ms (204848 allocations: 158.35 MiB)) 128x128 (828.861 ms (204848 allocations: 534.07 MiB)) 256x256 (2.513 s (204848 allocations: 1.93 GiB)) - the default
I think 64x64 seems to strike a good balance of speed and performance.
This could be exposed as a parameter indicating the exponent of the power of two to use: default could be 6, 7 or 8 (current default) would be reasonable for higher quality
Then again, even 128 in real world data has some issues - but it occurs to me this might be due to far outliers in the data leading to lower resolution in the denser parts.
Trimming the upper and lower 1% of points helped. Maybe that can also be a parameter?
Waaait a second - the pdf
function is only being used to select the levels. But the contour
function can already do that internally. Is there any benefit to this? Quite expensive for just that task.
Contour alone:
Current implementation:
"levels are evenly-spaced in the cumulative probability mass" is what the documentation says. Maybe that's importantly different from what GR does internally. Not sure what the pros and cons would be.
I was hoping to use something like
marginalkde
to better display dense scatterplot data, but it takes 100x as long as a scatter plot to generate.The slow part is entirely the
pdf
call for each x/y.Barring performance improvements to
pdf
, is there a way to reduce the resolution so it can be calculated faster?