JuliaMath / Interpolations.jl

Fast, continuous interpolation of discrete datasets in Julia
http://juliamath.github.io/Interpolations.jl/
Other
531 stars 109 forks source link

Image warping #515

Open RainerHeintzmann opened 2 years ago

RainerHeintzmann commented 2 years ago

I tried to use Interpolations.jl to deform an image. I did not see a simple interface for this task, so here is my code:

using Interpolations, TestImages, BenchmarkTools
img1 = Float32.(testimage("resolution"));
intlin = extrapolate(interpolate(img1, BSpline(Linear())), 0)
mymid = size(img1).÷2 .+1
f(t) = (0.1*t[1]+0.1*t[2], 0.0) # defines the warp as a local shift
@time coords = [Float32.(Tuple(c) .+ f(Tuple(c).-mymid)) for c in CartesianIndices(img1)]
do_int(t) = intlin(t...)
res = zeros(size(img1))
@btime $res .= do_int.($coords);

I expected this linear interpolation to be fairly fast, but it takes about 0.8 second for this 1920 x 1920 image and consumes 168 Mb. Is there a way to avoid the allocations?

mkitti commented 2 years ago

You could probably get some speed by dropping the extrapolation for parts that you know are inside the domain. The extrapolation introduces a branch that really slows down the computation.

RainerHeintzmann commented 2 years ago

The function to use in this case seem to be warp of the ImageTransformations.jl package. This takes less than 20 msec for this size. Here is code with the extrapolation dropped:

using Interpolations, TestImages, BenchmarkTools
img1 = Float32.(testimage("resolution"));
# intlin = extrapolate(interpolate(img1, BSpline(Linear())), 0)
intlin = interpolate(img1, BSpline(Linear()))
mymid = size(img1).÷2 .+1
f(t) = (-0.1*t[1]+0.1*t[2], 0.0) # defines the warp as a local shift
@time coords = [Float32.(Tuple(c) .+ f(Tuple(c).-mymid)) for c in CartesianIndices(img1)]
do_int(t) = intlin(t...)
res = zeros(Float32, size(img1))
@btime $res .= do_int.($coords);

It makes no difference. My suspicion is that the line do_int(t) = intlin(t...) somehow consumes memory and this leads to 20x slowdown.

mkitti commented 2 years ago
res .= map(do_int, coords);

Doing a map rather than the broadcast seems a bit faster for me.

mkitti commented 2 years ago

@RainerHeintzmann What is the ImageTransformations.jl code you are using?

@johnnychen94 do you have any thoughts here?

johnnychen94 commented 2 years ago

In ImageTransfomations.jl, if @RainerHeintzmann is referencing https://juliaimages.org/ImageTransformations.jl/dev/examples/operations/swirl/#Swirl-effect-using-warp-operation, it might be:

using ImageTransformations, StaticArrays
using TestImages, BenchmarkTools

mymid = SVector{2,Float64}(center(img1))
function f(c, mid=mymid)
    t = c - mid
    o = @SVector [-0.1*t[1]+0.1*t[2], 0.0]
    return c + o
end
@time warp(img1, f, axes(img1))
#   0.687884 seconds (14.82 M allocations: 369.821 MiB, 6.30% gc time, 5.35% compilation time: 70% of which was recompilation)

it's unclear to me how to reach 20ms runtime 😢

mkitti commented 2 years ago

@johnnychen94 Does that use Interpolations.jl at all under the hood or something else happening?

johnnychen94 commented 2 years ago

Except for a few tweaks (e.g., skip prefiltering) and wrappers, it's using Interpolations https://github.com/JuliaImages/ImageTransformations.jl/blob/7557c6c0fc700828a8ff3d420f45914a036c87e9/src/warp.jl#L165-L177

mkitti commented 2 years ago

Yea, that's why I'm slightly confused here. It should be quite similar.

Perhaps using the for loop rather than map or broadcasting in addition to SVector / Tuple helps to avoid some small amount of allocation somewhere?

I've been trying to make sense of the new allocation profiler to no avail yet.

RainerHeintzmann commented 2 years ago

It is all about avoiding allocations in the internal calculation. This code below is blindingly fast (43 ms, 16 Mb alloc) in comparison (700ms, 370 Mb). But it is quite challenging to avoid allocations. Does anyone know a trick how to force the compiler to avoid allocations? E.g. as soon as you introduce accesses to mymid instead of the (bad) hancoded 961, performance breaks down, since there are inner loop allocations. I guess doing this via a matrix multiplication in homogeneous coordinates may work, but is there no easier way to reduce allocations?

using ImageTransformations, StaticArrays
using TestImages, BenchmarkTools

img1 = Float32.(testimage("resolution"));
mymid = SVector{2,Int64}(size(img1).÷2 .+1)

function g(t::SVector{2, Int64}) # tmp=SizedArray{Tuple{2,}}([0,0]) 
    return  @SVector [961f0 - 1.1f0*(t[1]-961)+0.1f0*(t[2]-961), t[2]] # defines the warp as a local shift
end
@time res = warp(img1, g, axes(img1));