FMIFlux ScaleSum is not able to handle more than one scaled sum

juguma commented 1 year ago

The ScaleSum layer is a helpful way to add, e.g. NN-output with FMU-output, however it only works for a Vector of the same length as the scale which has been used in the definition of the ScaleSum layer (in the provided examples, it is 2).

ScaleSum should be able to handle this for more than a Vector. My proposal would be to prepare it for an n-Vector of m-Vectors (where m is the length of the scale vector when defining the layer, typically m=2) and n is the number of inputs that are to be "scale-summed".

Here's a minimal example (including a proposed solution):

using FMIFlux
scale = [0.7 ,0.3]
gates = ScaleSum(scale)

#input as n-Vector of 2-vectors
b1=[1.0,2.0] #<-this is the variant for which ScaleSum works already
b3=[[1.0,2.0],[1.0,2.0],[1.0,2.0]] #<-this is for what it should work
b1Vec=[[1.0,2.0]] #<-this Vector of Vector variant for 1 value pair

#proposed Variant for the ScaleSum-function:
function scsum(x)
    if length(x[1])==1#<-input is not Vector of Vector,but only Vector (with the lenght equal to scale)
        x_proc = [sum(x.*scale)]
    else
        x_proc = similar(x[1],length(x))
        for i in eachindex(x)
            x_proc[i] = sum(x[i].* scale)
        end
    end
    return x_proc
end

#test
using BenchmarkTools
@btime scsum(b1)
#vs.
@btime gates(b1)
#<-same time, same resources, same result
@btime scsum(b1Vec)
#<-same resources, same result as b1, even though it goes through the else branch
@btime scsum(b3)

juguma commented 1 year ago

I just realized you already tackled that in v0.10.2 with the new "groups" parameter in ScaleSum. Now what I proposed here is obsolete or should only be considered if it turns out it is more performant (together with the PairWithCacheRetrieve #101).

juguma commented 1 year ago

Okay, before I throw it away I made some comparison and test. I extended my draft of the ScaleSum to a variant with multiple scales (one for each group as in v0.10.2). This goes along with a proposal for a matching PairWithCacheRetrieve-Layer (cf. #101). Here's my draft for an improved ScaleSum and new PairWithCacheRetrieve. It is more performant, need less memory, but is unfortunately not backward compatible to v0.10.2's ScaleSum.

#compare extended ScaleSum+PairWithCacheRetrieve with existing ScaleSum (with scale-groups)
using FMIFlux
using Flux
using BenchmarkTools

### ScaleSum ###
##NEW: an alternative ScaleSum, which is able to handle a different scale-Vector 
# for each Vector (x[i]) in the Vector of Vectors(x)
# since the scale (vec of vec) is not considered correctly in the Layer (wrong type) 
# Vector{Vector{Float64}}<:AbstractArray{Float64,1}  is false
struct ScaleSumN{T}
    scale::AbstractArray{T}

    function ScaleSumN{T}(scale::AbstractArray{T}) where {T}
        inst = new(scale)
        return inst
    end

    function ScaleSumN(scale::AbstractArray{T}) where {T} #AbstractArray{<:AbstractArray{T}}
        return ScaleSumN{T}(scale)
    end
end
export ScaleSumN

function (l::ScaleSumN)(x)
    if length(x[1])==1#<-input is not Vector of Vector,but only Vector (expected with the length equal to scale)
        length(l.scale)==length(x) && typeof(l.scale[1]) <: Number || error("This Layer was defined with a scale which is a Vector of Vectors. Please call it with an x of the same shape, e.g. try [x].")
        x_proc = [sum(x.*l.scale)]
    else
        x_proc = similar(x[1],length(x))
        if length(l.scale) == length(x) #->a scale-vector for each vector in x
            for i in eachindex(x)
                x_proc[i] = sum(x[i].* l.scale[i])
            end
        else#->the same scale vector for all vectors in x
            for i in eachindex(x)
                x_proc[i] = sum(x[i].* l.scale)
            end
        end
    end
    return x_proc
end

Flux.@functor ScaleSumN (scale, )

#########
#Pair with CacheRetrieve
struct PairWithCacheRetrieve
    cacheLayer::CacheLayer

    function PairWithCacheRetrieve(cacheLayer::CacheLayer)
        inst = new(cacheLayer)
        return inst
    end
end
export PairWithCacheRetrieve
function (l::PairWithCacheRetrieve)(idx,x)
    #for all idx writes cache[idx],x[idx] <-i.e. length of x must be equal to lenght of idx 
    tid = Threads.threadid()
    retVal = typeof(x)[]
    sizehint!(retVal, length(idx))
    for i in 1:lastindex(idx)
        push!(retVal, [l.cacheLayer.cache[tid][idx[i]], x[i]])
    end
   return retVal
end

#########################################
#the comparison....->
cache = CacheLayer()
v7 = [1.0,2.0,3.0,4.0,5.0,6.0,7.0]
cv7 = cache(v7)
idxs = vcat(1,3,5:6)#
scale = [0.7 ,0.3]

#OLD
cRL = CacheRetrieveLayer(cache)
scaleO1 = [scale[1],2.0*scale[1],3.0*scale[1],4.0*scale[1],scale[2],2.0*scale[2],3.0*scale[2],4.0*scale[2]]
gatesO1 = ScaleSum(scaleO1,[[1,5],[2,6],[3,7],[4,8]])
gatesO1(cRL(idxs,v7[idxs]))
#<-scaleO1, gatesO1 with a different scale-pair for each "group"
scaleO2 = [scale[1],scale[1],scale[1],scale[1],scale[2],scale[2],scale[2],scale[2]]
gatesO2 = ScaleSum(scaleO2,[[1,5],[2,6],[3,7],[4,8]])
gatesO2(cRL(idxs,v7[idxs]))
#<-scaleO2, gatesO2 with identical scale-pairs for each "group"
scaleO3 = scale
gatesO3 = ScaleSum(scale,[[1,5],[2,6],[3,7],[4,8]])
#gatesO3(cRL(idxs,v7[idxs]))#<-error
#<-scaleO3 = scale, gatesO3 with one scale-pair for all "group"s. Doesn't work

#NEW: uses as scale input a vector of vectors (but also accepts a single vector for backward compatibility), 
# scales each x-vector with the scale, then sums
cRPair = PairWithCacheRetrieve(cache)
scaleN1 =[scale,2.0*scale,3.0*scale,4.0.*scale] 
gatesN1 = ScaleSumN(scaleN1)
gatesN1(cRPair(idxs,v7[idxs]))
#<-scaleN1, gatesN1 with a different scale-pair for each 2-Vec in Vec{Vec}
scaleN2 =[scale,2.0*scale,3.0*scale,4.0.*scale] 
gatesN2 = ScaleSumN(scaleN2)
gatesN2(cRPair(idxs,v7[idxs]))
#<-scaleN2, gatesN2 with identical scale-pairs for each 2-Vec in Vec{Vec}
scaleN3 = scale
gatesN3 = ScaleSumN(scaleN3)
gatesN3(cRPair(idxs,v7[idxs]))
#<-scaleN3 = scale, gatesN3 with one scale-pair used for each 2-Vec in Vec{Vec}

#compare
@btime gatesO1(cRL(idxs,v7[idxs]))
@btime gatesN1(cRPair(idxs,v7[idxs]))
@btime gatesO2(cRL(idxs,v7[idxs]))
@btime gatesN2(cRPair(idxs,v7[idxs]))
#gatesN3,gatesO3 can't be compared.
#<-you should observe that the new scale sum (together with a machting cacheretrieve as Vec{Vec}) is faster and needs less memory

#old vs new gates for only one pair
#ScaleSum and ScaleSumN have to be newly defined, since a different scale is used
gatesAO1 = ScaleSum(scale,[[1,2]])#<- as possible since v0.10.2
gatesAO2 = ScaleSum(scale) #<-as possible since v0.9.0
gatesAN1 = ScaleSumN(scale) #<- for backward compatibility to the original v0.9.0 version for Vector input when called
gatesAN2 = ScaleSumN([scale]) #<- to make sure it also works for Vec{Vec}-scale (and 1-Vec{Vec} when called)

@btime gatesAO1(cRL(1,v7[1]))
@btime gatesAO2(cRL(1,v7[1]))

@btime gatesAN1(cRPair([1],[v7[1]]))#<- Vec-scale, Vec{Vec}-Argument, works
@btime gatesAN1([v7[1],v7[1]])#<-using a 2-Vector{Float64} works, too
@btime gatesAN2(cRPair([1],[v7[1]]))#<-Vec{Vec}-scale, Vec{Vec}-Argument, works
gatesAN2([v7[1],v7[1]])#<- Vec{Vec}-scale, Vec-Argument, throws an error, since this is an unallowed compbination (a Vec{Vec} scale works ONLY with a Vec{Vec}-x)

#test in a Chain, whether the functors of ScaleSumN are correctly included
net = Chain(cache,
            dx->cRPairL(idxs, dx[idxs]),
            gatesN1)
Flux.params(net)
#yippieh!

#open: 
#1 unclear whether the type of "scale" in the ScaleSumN layer is correct
#2 way to insert error
#3 conditions to test for erroneous state

@ThummeTo, since the new ScaleSum layer is not an extension to existing functionality(, the PairWithCacheRetrieve only makes sense with the new ScaleSum,) and these modifications would involve some modifications in other scripts (the Julia-Con-Tutorial), presumably some tests, I refrain from starting a Pull Request before I hear back from you on this matter.

ThummeTo commented 1 year ago

Dear @juguma, thanks! Can we wait another week or so, because there will be MAJOR changes to FMIFlux.jl (only good ones of course) and many improvements. However I think the proposed layer will work in the next release too without changes.

juguma commented 1 year ago

Dear @ThummeTo, sure we can wait (since the modifications are rather feature than bug). And MAJOR (good) changes are a good reason to wait.

ThummeTo commented 1 year ago

Little teaser: Differentiable event indicators ;-)

juguma commented 1 year ago

Dear @ThummeTo, the findings of today's discussion and yet another try-and-error-session on my side:

what I propose looks nice (that's me saying it and you have to admit ;-)), but has some issues. Most relevant is that the currently defined layers cannot be differentiated with Zygote. To work around this, we decided to use collect in the layer definitions, i.e. x_proc = collect(sum(x[i].* l.scale[i]) for i in eachindex(x)) in the ScaleSum and collect([l.cacheLayer.cache[tid][idx[i]], x[i]] for i in 1:lastindex(idx)) in the PairWithCacheRetrieve-Layer. This modification helps in the ScaleSum layer, now Zygote.gradient is happy, but it doesn't in the CacheRetrieve-Layer. I don't know why.
Julia is faster with its multiple dispatch than with if-then-else-conditions. I tried to split the ScaleSumN into two functions - for vectors, and for vectors of vectors, however I failed to define the types properly. Closest to successful was: function (l::ScaleSumN)(x::AbstractVector{<:Union{Float32,Float64}}) and function (l::ScaleSumN)(x::Union{AbstractVector{Vector{Float64}},AbstractVector{Vector{Float32}}}), but it returned the wrong output for gatesAN1(cRPair([1],[v7[1]])) ([1.4] instead of [1.0], so here I gave up. I'd prefer to take the if-then-else variant, at least this one was running as expected.
as a final thought: If at some point you decide to no longer support Zygote (or keep it separate), I guess it makes more sense to put the PairWithCacheRetrieve together with the ScaleSum into one layer (and with it avoid the intermediate step which creates a vector of vectors from a vector). I can't imagine this to work without extra memory allocation.

Having all that noted down, I am fairly happy to work with my local variant of these layers for now and to let you focus more on the finalization of differentiable event indicators (since that's the clear blocking point for us now, and not the layer cosmetics). Cheers.

juguma commented 8 months ago

This is old and outdated. In principle, this functionality is covered (and was already while filing the ticket, stupid me). There might be potential to make it more effective, but if so, then the title is misleading, and if I come to that conclusion, I'll open a new issue.

ThummeTo / FMIFlux.jl

FMIFlux ScaleSum is not able to handle more than one scaled sum #100