MakieOrg / Makie.jl

Interactive data visualizations and plotting in Julia
https://docs.makie.org/stable
MIT License
2.33k stars 292 forks source link

Larger than memory heatmaps #973

Open rafaqz opened 3 years ago

rafaqz commented 3 years ago

With spatial raster data it would be good to be able to plot larger-than-memory heatmaps and images, and zoom in to get detail.

I'm not sure how/where this could be implemented, just flagging this as something that would be very useful in future, and as a key requirement for some spatial plotting workflows.

See: https://github.com/yeesian/ArchGDAL.jl/issues/195 https://github.com/rafaqz/GeoData.jl/issues/172

SimonDanisch commented 3 years ago

from https://github.com/JuliaPlots/Makie.jl/issues/443 we got an example, that shows how one can quite easily implement some calculating/fetching of data when zooming in/out:

using GLMakie
fig = Figure()
ax = Axis(fig[1, 1])
function datashader(limits, pixelarea)
    # return your heatmap data
    # here, I just calculate a sine field as a demo
    xpixels, ypixels = widths(pixelarea)
    xmin, ymin = minimum(limits)
    xmax, ymax = maximum(limits)
    [cos(x) * sin(y) for x in LinRange(xmin, xmax, xpixels),
                         y in LinRange(ymin, ymax, ypixels)]
end
xrange = lift(x -> minimum(x)[1] .. maximum(x)[1], ax.finallimits)
yrange = lift(x -> minimum(x)[2] .. maximum(x)[2], ax.finallimits)
pixels = lift(datashader, ax.finallimits, ax.scene.px_area)
heatmap!(ax, xrange, yrange, pixels,
    xautolimits = false, yautolimits = false)
display(fig)
rafaqz commented 3 years ago

Thanks! Amazing that can be done already. Is it possible to put this in a recipe? It would be good if GeoData.jl could do this by default above a certain size.

SimonDanisch commented 3 years ago

Yeah I'd be happy to... I can help with putting it in a recipe, if someone extends the above script with the required interpolation math etc ;)

rafaqz commented 3 years ago

Oh that's ok I can do it, I just wondered if it was possible at all! But didn't make that so clear. I guess lift does some callback magic to allow updating in the interface, which is very cool.

I'll let you know if I hit any snags.

lazarusA commented 3 years ago

@rafaqz hey, so... how are you actually using datashader, let's say for a matrix 40,000x40,000 as the input.

rafaqz commented 3 years ago

Yep, it works fine in basic scripts. Im too busy this month and I'll wait for the recipes iterface to settle here, but the plan is that GeoData.jl will have lazy plot zooming leveraging DiskArrays.jl/Makie.jl, using something like this.

Edit: With the small text on my phone I missed the word "how" and thought you were just asking if I actually used this!

SimonDanisch commented 3 years ago

Yep, it works fine in basic scripts

Would you mind already posting that basic script so others can play around with it? Or is it still just the demo that doesn't actually load anything from disk/ram?

rafaqz commented 3 years ago

Ok, trying to clean up my example to post here, but it seems like the example above doesn't work anymore either?

julia> fig = GLMakie.Figure()
ERROR: type GridLayout has no field needs_update
Stacktrace:
  [1] getproperty(x::GridLayout, f::Symbol)
    @ Base ./Base.jl:33
  [2] (::AbstractPlotting.var"#831#832"{GridLayout})(al::Outside)
    @ AbstractPlotting ~/.julia/packages/AbstractPlotting/M8Nlv/src/figures.jl:80

And to clarify, in my now-broken the code, the disk loading is abstracted DiskArrays.jl/GeoData.jl internals, it's not apparent that's happening, it's just used like an array. But I'll post that if we can get the above working and I can fix my code too.

Here's a simple way to replace the sin/cos array used above (I think, can't test):

dxmin, dymin = map(max, map(x -> trunc(Int, x), (xmin, ymin)), map(first, axes(A)))
dxmax, dymax = map(min, map(x -> trunc(Int, x), (xmax, ymax)), map(last, axes(A)))
dxpixels = round(Int, (dxmax - dxmin) / (xmax - xmin) * xpixels)
dypiyels = round(Int, (dymax - dymin) / (ymax - ymin) * ypixels)
xs = [trunc(Int, x) for x in LinRange(dxmin, dxmax, xpixels)]
ys = [trunc(Int, y) for y in LinRange(dymin, dymax, ypixels)]
A[xs, ys]

You could replace trunc with interpolation if you need that.

The more complicated script uses DimensionalData.jl selectors to work in e.g. lat/lon instead of pixels, but I don't think that was finished and I also can't test it.

jkrumbiegel commented 3 years ago

This is a regression in relation to GridLayoutBase and is already fixed, try updating Makie. You seem to be on an older version if AbstractPlotting is loaded.

rafaqz commented 3 years ago

Right. WebIO.jl needs to register a patch version, it's holding back Observables to 0.3.3. Thats why no new Makie...

garibarba commented 2 years ago

from #443 we got an example, that shows how one can quite easily implement some calculating/fetching of data when zooming in/out:

using GLMakie
fig = Figure()
ax = Axis(fig[1, 1])
function datashader(limits, pixelarea)
    # return your heatmap data
    # here, I just calculate a sine field as a demo
    xpixels, ypixels = widths(pixelarea)
    xmin, ymin = minimum(limits)
    xmax, ymax = maximum(limits)
    [cos(x) * sin(y) for x in LinRange(xmin, xmax, xpixels),
                         y in LinRange(ymin, ymax, ypixels)]
end
xrange = lift(x -> minimum(x)[1] .. maximum(x)[1], ax.finallimits)
yrange = lift(x -> minimum(x)[2] .. maximum(x)[2], ax.finallimits)
pixels = lift(datashader, ax.finallimits, ax.scene.px_area)
heatmap!(ax, xrange, yrange, pixels,
    xautolimits = false, yautolimits = false)
display(fig)

This works quite well for me, but I'm getting poor performance (unusable) when using WGLMakie instead of GLMakie (on macOS FYI). I've found this discussion (https://discourse.julialang.org/t/lag-with-wglmakie/55399/8) pointing out that the data transfer (Julia-WebGL) should be expected to be slow. Am I missing something here or is this the full story? Thanks!

SimonDanisch commented 2 years ago

It should be much slower than GLMakie ...Although not very optimized at the moment... So if you want to do some profiling, you may find some low hanging fruits to improve performance... Which reminds me, I just recently figured out a better way to serialize arrays more efficiently, which could be used in JSServe as well....

rafaqz commented 1 year ago

With https://github.com/MakieOrg/Makie.jl/issues/2779 I'm wondering if Makie should do this by default for all heatmaps? Just plotting regular in-memory arrays as heatmaps is slow if they get too big.

SimonDanisch commented 1 year ago

Hm we don't have axis recipes yet, and this would need access to the finallimits... But maybe we can prototype this in some package, and then figure out how to integrate it into Makie going forward?

asinghvi17 commented 1 year ago

Isn't this what Tyler.jl does (more or less)? I think there's a PR there which would allow arbitrary tiles, which one could extend to tiles mapped or resampled from a backing DiskArray...

SimonDanisch commented 1 year ago

Yeaah, I was thinking about that.... But didn't have a quick idea how to make it work nicely (I mean, something I could implement in ~30 min or so)

jkrumbiegel commented 1 year ago

Hm we don't have axis recipes yet, and this would need access to the finallimits

isn't that similar to the stuff vlines etc do, this works with the camera observables we get via the parent scene already. In the future, that could just be part of the scene/plot interface so we don't have to hook the observables directly

rafaqz commented 1 year ago

Tyler.jl has 2x resolution changes though, so there will be noticeable jumps when you zoom if heatmaps work like Tyler. I'm just doing a rounding interpolation currently.

function datashader(A, limits, pixelarea; maxres = 500)
    xmin, ymin = minimum(limits)
    xmax, ymax = maximum(limits)
    xpixels, ypixels = widths(pixelarea)
    xres = round(Int, max(1, min(maxres, xpixels, xmax - xmin)))
    yres = round(Int, max(1, min(maxres, ypixels, ymax - ymin)))
    dxmin = max(xmin, 1)
    dxmax = min(xmax, size(A, 1))
    dymin = max(ymin, 1)
    dymax = min(ymax, size(A, 2))
    xrange = LinRange(dxmin, dxmax, xres)
    yrange = LinRange(dymin, dymax, yres)
    # Interpolate to vectors of indices
    xs = trunc.(Int, xrange)
    ys = trunc.(Int, yrange)
    return A[xs, ys]
end

xrange = lift(ax.finallimits) do fl
    max(minimum(fl)[1], 1) .. min(maximum(fl)[1], size(first(maps)[], 1))
end
yrange = lift(ax.finallimits) do fl
    max(minimum(fl)[2], 1) .. min(maximum(fl)[2], size(first(maps)[], 2))
end

pixels = lift(datashader, m, ax.finallimits, ax.scene.px_area)
p = heatmap!(ax, xrange, yrange, pixels)

(although this is slightly buggy) Should be mostly fixed

jkrumbiegel commented 1 year ago

It could be that this implementation where x, y and the data is lifted from finallimits updates inefficiently because the heatmap args conversion is triggered three times per limit change. Might be better implemented with on, then mutating x and y before triggering the data.

felixcremer commented 1 year ago

@rafaqz is this something we would like to look at next week during MakieCon?

rafaqz commented 1 year ago

Yeah we could totally hack on this.

I was thinking it would be good to have zoom level caching like Tyler.jl, which would need defining zoom levels... so @asinghvi17 is right its basically just generalising Tyler.jl.

But we could build the tile pyramid efficiently somehow, just doing one pass to make e.g. the first 4 layers of tiles. The do the rest lazily as they need less disk reads anyway.

We could also try using layers of COG tiffs and similar where the tile pyramid is already built. Rasters.jl doesnt access those currently so thats a little more work.

asinghvi17 commented 1 year ago

If we can get a good recipe for layers, we could make it easy to create a layers object from file, then just plot that. Perhaps an alternate dispatch for heatmap would also serve this purpose.

rafaqz commented 1 year ago

Interesting, yeah if there was some kind of plottable Layers object we could return it from recipes and provide callbacks for getting the data.

rafaqz commented 1 year ago

Regular heatmaps could also use it with a function like the one above for subsampling.

timholy commented 1 year ago

FYI this is something that ImageView does (via ImageBase.restrict), although there seem to be some needed optimizations to make it faster. You're welcome to steal anything that seems useful.

rafaqz commented 1 year ago

Ah interesting ImageBase.jl restrict does proper anti-aliased pyramid building.

I was imagining for quick plotting of large disk-based data we would just use a nearest pixel algorithm so its fast. Then with some chunking patterns we can probably skip reading some of the data at all for the top level images.

But we could use restrict for better quality outputs.

rafaqz commented 1 year ago

Some light reading:

https://www.researchgate.net/publication/311423420_An_Efficient_Tile-Pyramids_Building_Method_for_Fast_Visualization_of_Massive_Geospatial_Raster_Datasets

@felixcremer from your other Rasters.jl plotting issue where you mosiac a lot of plots I guess you would want that for this too?

We could build this so its efficient for multiple mosaic tiles.

felixcremer commented 1 year ago

In my case the data of a single tile is 15000 x 15000 which already kills Julia if I try to plot all of it. I currently only plot every 10nth pixel. But yes, when we go to more tiles I would like to only plot a meaningful subset.

rafaqz commented 1 year ago

We should be able to mosaic 20 of those and pan and zoom around with no lag

(I mean it should be possible to write it so we can do that :sweat_smile: )

felixcremer commented 1 year ago

That was how I was reading this, yes. So lets do work on this on tuesday.

timholy commented 1 year ago

Yeah, the problem with plotting every nth is that it can be awfully misleading---there can be really interesting stuff happening in a superpixel and you can miss it entirely.

There's a certain sense in which I think the best option is to come up with a way of representing the extrema (min/max) of each superpixel, but I don't know of a good way of doing that. If you had to pick one, the mean seems as good as anything else I can think of.

rafaqz commented 1 year ago

The mean is always nice. But it depends how much patience we have for building the pyramid when its so easy to just zoom in and see more detail of a particular area.

Reading every Nth might mean reading 1 Nth of the data if e.g. the chunks are colums. So N times faster to get the first overview image.

timholy commented 1 year ago

I agree. But the downsides should be acknowledged. Maybe a practical example would make it clearer: imagine a photograph from astronomy and imagine stars are pointlike and very sparse. The every-nth pixel solution might return a completely black image, missing every single star in the image.