compatibility with Flux

CarloLucibello commented 3 years ago

Hi, this is to enquire about the possibility of using Interpolations.jl to build image upsampling or downsampling layers in Flux.jl. We recently added a bilinear upsampling function https://github.com/FluxML/NNlib.jl/pull/262, but I was wondering if we could leverage instead some of the code here. The requisites would be

handle batch dimension
gradient computation
compatible with CuArrays

I'm not familiar with the codebase here, maybe this is a long shot, but worth making an attempt and creating awareness about the possibility of this kind of interaction. Moreover gpu and automatic diff friendly interpolations would generally benefit the ML ecosystem.

Best, CL

cc @johnnychen94

mkitti commented 3 years ago

Would compatible with CuArrays create a dependency on CUDA?

CarloLucibello commented 3 years ago

Typically no, CuArrays just follow abstract array interface, although scalar indexing leads to horrible performance. To address such cases a cuda kernel is needed (and the CUDA.jl dependence)

maxfreu commented 3 years ago

Just having completed the upsampling code: I would now rather use KernelAbstractions.jl for such work, as it saves you from writing almost the same code twice. Plus I think it's a much lighter dependency than CUDA. In the end I imagine having super portable (CPU, GPU (Nvidia & AMD!)) code with gradients, which can be recycled in julia math, images and nnlib. It could even be used to write specialized methods for different dimension orderings and dispatch via NamedDims.

moesphere commented 3 years ago

Related to the gradient computation, it would be helpful to have a rrule and frule function defined, package ChainRulesCore, to be able to use Interpolations with AD, e.g. Zygote. A gradient function is already defined. Thus the only part missing that needs to be implemented is a gradient with respect to the fields of Interpolations objects, if I understand it correctly, see here

DhairyaLGandhi commented 3 years ago

Could you point to the gradient function?

rick2047 commented 3 years ago

I don't know too much about the code, but simple debugging tells me there are 6 gradient functions

These have special definitions

Indexing.jl x2
monotonic.jl

These call the general gradient functions

Interpolations.jl
extrapolation.jl
scaling.jl

I suspect the indexing.jl one is the one which needs to be used here.

DhairyaLGandhi commented 3 years ago

Found it! https://github.com/JuliaMath/Interpolations.jl/blob/ec3501365408df72396d8e5e3db59ca223ce766f/src/Interpolations.jl#L413

rick2047 commented 3 years ago

Now that #414 is finished, how do we rewrite the upsampling functions? Going through the NNLib code, it seems like now that we have chainrules integration we can just rewrite stuff like upsample_nearest and its gradient. If its just that, I will open a PR their.

mkitti commented 3 years ago

Have you taken a look at http://juliamath.github.io/Interpolations.jl/latest/devdocs/ yet?

rick2047 commented 3 years ago

I have read that page and I think I get it, a bit. But I was wondering if we should rather write them as simple functions using interpolation objects. Like the upsample_nearest is something like

itp = interpolate(x,(NoInterp(),NoInterp()))

We already have parent axes, which can be used to determine size.

DhairyaLGandhi commented 3 years ago

Would it work similarly to the kernels we have now in NNlib? Do we get the correct gradients out etc?

Worth doing a comparison imo

rick2047 commented 3 years ago

I dont know what the kernel system is in NNLib (not familiar with that codebase at all). But I was thinking I can replace the code for upsample_nearest and its gradient function to the Zygote call. They have tests for these functions so rewriting should be easier.

maxfreu commented 3 years ago

Hi! @DhairyaLGandhi this serves as a reply to https://github.com/FluxML/NNlibCUDA.jl/pull/5#issuecomment-828444750, but I think it might be better to keep the discussion in one place :)

Yes, I would find it cool if it works and we can get rid of duplicate code (of which I really wrote a lot, sorry for that). But I don't know if the performance can match the handwritten kernels I ported from pytorch (at least for now). For example I benchmarked the CPU implementation of bilinear upsampling against imresize! in ImageTransformations.jl (which uses interpolations.jl under the hood) and NNlib was 2.2x faster, even single threaded and with sub-optimal memory layout. Furthermore the ImageTransformations.jl code doesn't work on gpus when scalar getindex is disallowed. So I don't really know what the best way forward is. Maybe to enhance interpolations.jl and then use it? But how should all the specialized GPU code be handled, like the setup for the kernel calls (setting thread & block size)? Create CUDAinterpolations.jl so that interpolations.jl doesn't have to depend on CUDA? Or simply proceed with collecting specialized code for neural networks / dl in NNlib and just pick the stuff which really makes things faster and simpler from other packages? Note that at least for upsampling / interpolations things can be simplified a lot by smarter dispatch, which I can maybe hand in two PRs down the line or so.

@rick2047

I dont know what the kernel system is in NNLib

I don't know either :smile: I just ported pytorch code brute force and it turned out to be quite fast, but also a bit unwieldy, which can probably be improved by properly leveraging the dispatch. The nearest neighbour part was written by @mcabott.

Lastly, maybe someone else can also benchmark the interpolations to check my results?

kiranshila commented 3 years ago

Pretty confused as to the current state of this. I'm trying to run the same example from #414 but have the following error

using Interpolations
using Zygote
y = sin.(1:10)
itp = interpolate(y,BSpline(Cubic(Reflect(OnCell()))))
Zygote.gradient(itp, 2)

ERROR: ArgumentError: unable to check bounds for indices of type Interpolations.WeightedAdjIndex{4, Ratios.SimpleRatio{Int64}}

Did something change in this package or Zygote?

kiranshila commented 3 years ago

Ah I see, that merge hasn't made it to a release yet. Disregard me.

jmsull commented 2 years ago

Sorry to revive this - but the above example does not work when I run it despite the previous merge. Further, when I run the package tests in a fresh julia installation

(@v1.7) pkg> add Interpolations
(@v1.7) pkg> test Interpolations

on the latest version (0.13.4) I find the error:

ChainRulesCore: Error During Test at /Users/jsull/.julia/packages/Interpolations/3gTQB/test/chainrules.jl:12
  Test threw exception
  Expression: (Zygote.gradient(itp, 1))[1] == Interpolations.gradient(itp, 1)
  MethodError: no method matching (::ChainRulesCore.ProjectTo{Float64, NamedTuple{(), Tuple{}}})(::SVector{1, Float64})
  Closest candidates are:
    (::ChainRulesCore.ProjectTo{T})(::ChainRulesCore.AbstractZero) where T at ~/.julia/packages/ChainRulesCore/sHMAp/src/projection.jl:120
    (::ChainRulesCore.ProjectTo{<:Number})(::ChainRulesCore.Tangent{<:Number}) at ~/.julia/packages/ChainRulesCore/sHMAp/src/projection.jl:192
    (::ChainRulesCore.ProjectTo{T})(::ChainRulesCore.Tangent{<:T}) where T at ~/.julia/packages/ChainRulesCore/sHMAp/src/projection.jl:142
    ...

for the test related to the previously discussed Zygote gradients issue. It seems like the problem is ChainRulesCore?

(I also find a many-times repeated error when running the tests in test/gradient.jl:216, but the place for discussing this is probably another issue.)

mkitti commented 2 years ago

Have you tried this on 1.6? There are known inference issues on 1.7

mcabbott commented 2 years ago

The example from above reproduces this error.

julia> using Interpolations, Zygote
julia> y = sin.(1:10);
julia> itp = interpolate(y,BSpline(Cubic(Reflect(OnCell()))));

julia> z, back = Zygote.pullback(itp, 2.0);

julia> z
0.769963450028415

julia> back(1.0)
([-0.35017548837401463],)

julia> Zygote.gradient(itp, 2.0);
ERROR: MethodError: no method matching (::ChainRulesCore.ProjectTo{Float64, NamedTuple{(), Tuple{}}})(::StaticArrays.SVector{1, Float64})

(jl_hY83L3) pkg> st
Status `/private/var/folders/yq/4p2zwd614y59gszh7y9ypyhh0000gn/T/jl_hY83L3/Project.toml`
  [a98d9a8b] Interpolations v0.13.4
  [e88e6eb3] Zygote v0.6.32

The error is coming from ChainRulesCore, but only because it's performing a sanity-check on the gradient returned by the rule. Above, back(1.0) avoids this, and it produces a 1-element vector as the gradient of a number, which doesn't make sense.

Where the bug is, or whether this used to work, I don't know.

mcabbott commented 2 years ago

Oh, the problem is just that there has not been a release since #465. On master this works.

jmsull commented 2 years ago

Just switching to master worked for me - thanks for pointing that out and for the explanation!

mkitti commented 2 years ago

I just released a new version 0.13.5

DhairyaLGandhi commented 2 years ago

I'd love to have the gradient wrt to the original array as well, and at that point I believe we could close this. Is there some interest in putting together a PR to that end?

stevengj commented 1 year ago

@DhairyaLGandhi, me too, but implementing that seems quite a bit harder, since it has to be done separately for each type of interpolation. One simple case that might be a good starting point would be to support linear gridded interpolation.

rs1909 commented 1 year ago

@DhairyaLGandhi @stevengj I would also love to have the gradient with respect to the data being interpolated. Especially for cubic splines, because I need smoothness... The problem is that I have no clue where to start implementing it. Does the interpolation involve solving equations, or are there some constant weights pre-computed? Also there is very little online about how cubic spline interpolation works in 2D or 3D. Any pointer to resources would be helpful. Is there any other package that could do this?

rs1909 commented 1 year ago

@DhairyaLGandhi @stevengj : Gradient with respect to the image is after all not that complicated. The interpolation linearly depends on the image, so this gradient is in fact independent of the image.

Internally, the gradient is calculated by the weightedindexes function and then multiplied with the image data. This has a sparse format, because the interpolation result depends only locally on the image. This is the same internal API all throughout the various interpolations. The only task is to convert the tuple of 'WeightedAdjIndex' to a digestible format, which I can do.

I would be happy if 'weightedindexes' is exported as an API and then I could use it as the gradient. Could someone make that happen?

JuliaMath / Interpolations.jl

compatibility with Flux #396