GiovineItalia / Gadfly.jl

Crafty statistical graphics for Julia.
http://gadflyjl.org/stable/
Other
1.9k stars 250 forks source link

hexbin misalignment #1497

Open evanfields opened 3 years ago

evanfields commented 3 years ago

Spoilers: I'm opening this issue even though I think it's partly a bug upstream in https://github.com/GiovineItalia/Hexagons.jl In fact, I'm hoping to open a PR there, though it may take me a while as I have no idea what I'm doing! If someone wants to beat me to it, please do. I'm also not sure about the etiquette of the upstream change, though luckily Gadfly is Hexagon's only direct dependent according to JuliaHub.

Apparent bug: rendered hexbins are misaligned with input data. Here's a small reproducing example: plot(x = [-10,0,10], y = [-10,0,10], Geom.hexbin(xbincount=4, ybincount=4)) image

Notice how no hexagon covers (-10,-10) or (10,10), and the bottom left hexagon covers no data points at all! It appears the hexagons have been shifted down and to the left from their proper positions.

I believe the issue begins here: https://github.com/GiovineItalia/Gadfly.jl/blob/master/src/statistics.jl#L1249-L1251

        h = convert(HexagonOffsetOddR, cube_round(x - xmin + xspan/2,
                                                  y - ymin + yspan/2,
                                                  xsize, ysize))

An input (x,y) is mapped to cubic hexagon coordinates (of the containing hexagon) using cube_round from Hexagons.jl. Unfortunately, cube_round returns the cubic coordinates of the hexagon that contains an input (x,y) assuming the zero-cubic-coordinate hexagon has xy-center (1,1). This is nasty, and exactly what my future upstream PR will hopefully fix. See this old issue: https://github.com/GiovineItalia/Hexagons.jl/issues/13

julia> Hexagons.cube_round(0,0)
HexagonCubic(0, 1, -1)

julia> Hexagons.cube_round(1,1)
HexagonCubic(0, 0, 0)

But when the hexagon-space coordinates are translated back to (x,y) coordinates a few lines later:

        x, y = center(HexagonOffsetOddR(idx[1], idx[2]), xsize, ysize,
                      xmin - xspan/2, ymin - yspan/2)

The (x,y) offset corresponding to the center of the coordinate system does not correct for the implicit (1,1) xy-center of the hexagon coordinate system used in cube_round.

So the upstream fix is to make cube_round (and center) from Hexagons ditch the (1,1) alignment, i.e. the origin of the cubic hexagonal grid system should, by default, coincide with the origin of the xy Cartesian plane. That upstream fix will, I weakly believe, fix this misalignment.

As a final note, I think the Gadfly statistics.jl code could be made a bit more readable as follows, though (if I'm understanding correctly) the final results won't change at all:

This suggested code and the current code perform equivalent coordinate conversions xy -> hex -> xy; the current code just has an unintuitive-to-me translation by xspan and yspan and then undoes that translation.

evanfields commented 3 years ago

Also worth noting that I tried locally changing the Hexagons.jl cube_round behavior to be (0,0) aligned. Hexagons.jl tests break, but the Gadfly result seems much better:

plot(x = [-10,0,10], y = [-10,0,10], Geom.hexbin(xbincount=4, ybincount=4)) image

But maybe not all the way better, seems like the xy point (0,0), but virtue of being at the mid point of the data, should ideally also correspond to the center of HexagonCubic(0,0,0). If the recommended changes to statistics.jl in Gadfly are made, flipping the signs of the xspan/2 and yspan/2 terms: image

bjarthur commented 3 years ago

@Mattriks @tlnagy thoughts?

evanfields commented 3 years ago

Ah, I forgot to comment in this issue after making the upstream PR. The linked Hexagons.jl PR (still awaiting review) seems to fix this issue.