denizyuret / AutoGrad.jl

Julia port of the Python autograd package.
Other
169 stars 26 forks source link

Inside `@diff`, shouldn't `similar(Result{Array})` return `Result{Array}`? #108

Open NHDaly opened 5 years ago

NHDaly commented 5 years ago

Hi! :) I think I've stumbled across a correctness issue w/ the similar function that leads to performance problems:

As far as I can tell, even when actually running within an @diff, similar(AutoGrad.Result{Array}) returns an Array. I would think, instead that it should return a Result{Array}.


I came across this trying to help a coworker diagnose why their AD code is slow. I saw from @btime that their code has a lot of allocations, which seem to grow with the size of the data (so it's not a one-time overhead related to compiling the extra methods).

I used Cthulhu.jl to look through the @code_warntype of the functions being differentiated, and found this:

%108 = invoke similar(::AutoGrad.Result{Array{Float64,1}})::Array{Float64,1}

That seems surprising to me! They were trying to use similar to prevent type-instability in this for-loop:

        # coef is passed in as an AutoGrad.Param{Array{Float64,1}}
        term = fill!(similar(coef[:,ndim]), 0)  # Originally this was `zeros(ndim)`, but i'm trying to prevent type instability
        for (idx,feat) in enumerate(feature)
            term += (feat .* coef[:,idx]) .^(iorder)
        end

As the comment explains, first we had term = zeros(ndim), but were concerned about the type instability inside the for-loop (the type of term would change after adding the Result{Array} to it). So we tried using similar(..., 0) to create a Result{Array} to start with so that adding coef to it doesn't change its type. But surprisingly, this returned the same thing as zeros().

Shouldn't similar(::AutoGrad.Result{Array{Float64,1}}) also return an AutoGrad.Result{Array{Float64,1}}, or am I misunderstanding? Thanks! :slightly_smiling_face:

denizyuret commented 5 years ago

The Result type represents the result of an operation that is recorded for gradient calculation. If a function has zero gradient (i.e. no dependence on the input), it does not need to get recorded, therefore it does not create a Result. similar is an example, so are size, length, eltype etc.

Ideally this should all be transparent to the programmer. The only functions in the API open to the programmer are Param, @diff, value and grad (i.e. I should be able to change the name or fields of Result without breaking any of your code).

Type stability is a difficult topic: while running outside of a @diff context, all Params are ignored and no Results are generated eliminating most of the overhead of AutoGrad. This means the same variable could be Array during some calls and Result{Array} during others.

In your case if you want to prevent the type change inside the for loop it may be possible to do term = 0 * coef[:,ndim] in the beginning.