Closed ecomodeller closed 10 months ago
Same artifact with the plotly backend
And if your variables should be negatively correlated (hopefully not the case for most models😉)
@daniel-caichac-DHI ? Can you verify this?
Yes I have seen this artifact, it depends on the number of bins the data is clustered for the density plot. The smaller the bin, the smaller the artifact. So if you define 1000 bins in your example, you should then not see it. It is the trade off for binning the data (2d histogram) for quick plotting. We could, alternatively, do some KDE estimation of the density of the data and use that for the color scale, but when you have 1e6 points or more, it can take an eternity.
Could we use overlapping bins to overcome this artifact (cheap alternative to rolling bins)?
Ok I just had some time to look at this, I replicated @ecomodeller code, but I think the solution is far more simple.
Topfigures: 100 bins, both points and histogram. Bottomfigures: Default (20 bins), both points and histogram.
The solution I see it as simple as as increased the extremely low default which is now , bins=20
, to something like bins=100
or bins=200
.
The scatter plot now follows the histogram. If by default we have bins=20
(as of now), we are clustering water level data in chunks of ~0.5m by ~0.5m, so of course it will look horrid.
The comparison that JAN did before was comparing a histogram of 100 bins vs a scatter plot with points whose colorscale comes from a histogram of just 20 bins, so it is not a fair comparison.
Sent this PR
Closed by #282
There seems to be an issue with the 2d density plot used in the scatter plot
These bands seems like an artifact.