MakieOrg / AlgebraOfGraphics.jl

An algebraic spin on grammar-of-graphics data visualization in Julia. Powered by the Makie.jl plotting ecosystem.
https://aog.makie.org
MIT License
444 stars 45 forks source link

Heatmap coloring boxes that should be empty #407

Open jariji opened 2 years ago

jariji commented 2 years ago

I expected the ! facet m row to be empty but it has colored boxes without text. There is also something strange going on with the tick alignment around there.

let
  n = 1000
  df = DataFrame(x=rand('A':'Z',n), y=rand('a':'z',n), z=rand(1:1000,n), c=rand('!':'%',n))
  d = combine(groupby(df, [:x,:y,:c]), :z => sum)
  # Remove values for one facet and row.
  subset!(d, [:x,:y,:c,:z_sum] => ByRow((x,y,c,z)-> !((c == '!') && (y == 'm'))))
  ll = (
      data(d)
      * mapping(:x,:y; layout=:c)
      * (
          visual(Heatmap) * mapping(:z_sum)
          + visual(Makie.Text; align=(:center,:center), color=:red) * mapping(text=:z_sum=>verbatim∘string)
        )
  )
  draw(ll; figure=(;resolution=(3000,2000)))
end

image

[cbdf2221] AlgebraOfGraphics v0.6.8
[13f3f980] CairoMakie v0.8.6
[ee78f7c6] Makie v0.17.6
julia 1.7.1
piever commented 2 years ago

What's happening IMO is the following. Makie sees a gap in the axis (from l to n, which has been automatically translated to integers by AoG) and is expanding the boxes to fill in the gap. A simpler MWE would be

using CairoMakie
heatmap([1, 2, 4, 5], [1, 2, 3, 4], rand(4))

heatmap

The conversion is happening here. This should probably be fixed on the Makie side first. Maybe one could allow an optional width argument to heatmap to help compute correct edges?

jariji commented 2 months ago

@piever says

This should probably be fixed on the Makie side first. Maybe one could allow an optional width argument to heatmap to help compute correct edges?

In the linked Makie issue @jkrumbiegel says

yes I agree this is problematic, but to me it's an AlgebraOfGraphics problem. It should probably add the missing categories in each heatmap when it links the facets.


Either way, this issue makes me nervous and I hope it can be fixed one way or another.

jkrumbiegel commented 2 months ago

I checked what ggplot's geom_tile does here which seemed to me to be most similar in use to Heatmap. It seems that this doesn't require that tiles are all adjacent, it simply determines the minimum distance on each scale and takes that as the default width/height:

library(ggplot2)

df <- data.frame(
  x = c(1, 2, 4),
  y = c(4, 5, 6),
  fill = c(1, 2, 3)
)

ggplot(df, aes(x, y, fill = fill)) + geom_tile()

image

This looks like the behavior you want at first, but you can see that it differs from Heatmap:

library(ggplot2)

df <- data.frame(
  x = c(0, 1, 2.2, 3.9),
  y = c(4, 5, 6, 7),
  fill = c(1, 2, 3, 4)
)

ggplot(df, aes(x, y, fill = fill)) + geom_tile()

image

I actually agree with you that the behavior is a bit confusing when we convert from x, y, z vectors to x centers, y centers, z matrix in Makie, that you can easily end up with an irregular grid because some x or y values are completely missing by chance. This would never happen with the vec, vec, matrix signature because you know the shape of the matrix, but that's not the one we can use from AoG with its tabular input data.

So I'm inclined to tighten that interface down, but I'm not completely sure how. Maybe it should only be allowed to pass x and y vectors with equidistant centers, or multiples of those (that's when you get gaps). This would probably have to be slightly accomodating towards floating point differences, but then we'd say in Makie, for unequal bin sizes you have to use the edge version where you specify n+i x values.

And if we want to match geom_tile's behavior, we would just need a different recipe (that's basically scatter with autochosen data-space markersize). One drawback of the way that ggplot is working is actually that you don't notice at all if you have overlapping datapoints, which is something that surprised me a bit for something that should usually visualize matrix-like data:

library(ggplot2)

df <- data.frame(
  x = c(1, 2, 1),
  y = c(1, 1, 1),
  fill = c(1, 2, 3)
)

ggplot(df, aes(x, y, fill = fill)) + geom_tile()

image

SimonDanisch commented 2 months ago

Maybe we can use spy more for this? From the examples it looks like we're talking about sparse data anyways. It's pretty much heatmap but with concrete points. It currently doesn't have a convert_arguments for ::Vector, ::Vector, ::Vector, but should be easy to add, with the correct behaviour.