Closed juguma closed 8 months ago
I just realized you already tackled that in v0.10.2 with the new "groups" parameter in ScaleSum. Now what I proposed here is obsolete or should only be considered if it turns out it is more performant (together with the PairWithCacheRetrieve #101).
Okay, before I throw it away I made some comparison and test. I extended my draft of the ScaleSum to a variant with multiple scales (one for each group as in v0.10.2). This goes along with a proposal for a matching PairWithCacheRetrieve-Layer (cf. #101). Here's my draft for an improved ScaleSum and new PairWithCacheRetrieve. It is more performant, need less memory, but is unfortunately not backward compatible to v0.10.2's ScaleSum.
#compare extended ScaleSum+PairWithCacheRetrieve with existing ScaleSum (with scale-groups)
using FMIFlux
using Flux
using BenchmarkTools
### ScaleSum ###
##NEW: an alternative ScaleSum, which is able to handle a different scale-Vector
# for each Vector (x[i]) in the Vector of Vectors(x)
# since the scale (vec of vec) is not considered correctly in the Layer (wrong type)
# Vector{Vector{Float64}}<:AbstractArray{Float64,1} is false
struct ScaleSumN{T}
scale::AbstractArray{T}
function ScaleSumN{T}(scale::AbstractArray{T}) where {T}
inst = new(scale)
return inst
end
function ScaleSumN(scale::AbstractArray{T}) where {T} #AbstractArray{<:AbstractArray{T}}
return ScaleSumN{T}(scale)
end
end
export ScaleSumN
function (l::ScaleSumN)(x)
if length(x[1])==1#<-input is not Vector of Vector,but only Vector (expected with the length equal to scale)
length(l.scale)==length(x) && typeof(l.scale[1]) <: Number || error("This Layer was defined with a scale which is a Vector of Vectors. Please call it with an x of the same shape, e.g. try [x].")
x_proc = [sum(x.*l.scale)]
else
x_proc = similar(x[1],length(x))
if length(l.scale) == length(x) #->a scale-vector for each vector in x
for i in eachindex(x)
x_proc[i] = sum(x[i].* l.scale[i])
end
else#->the same scale vector for all vectors in x
for i in eachindex(x)
x_proc[i] = sum(x[i].* l.scale)
end
end
end
return x_proc
end
Flux.@functor ScaleSumN (scale, )
#########
#Pair with CacheRetrieve
struct PairWithCacheRetrieve
cacheLayer::CacheLayer
function PairWithCacheRetrieve(cacheLayer::CacheLayer)
inst = new(cacheLayer)
return inst
end
end
export PairWithCacheRetrieve
function (l::PairWithCacheRetrieve)(idx,x)
#for all idx writes cache[idx],x[idx] <-i.e. length of x must be equal to lenght of idx
tid = Threads.threadid()
retVal = typeof(x)[]
sizehint!(retVal, length(idx))
for i in 1:lastindex(idx)
push!(retVal, [l.cacheLayer.cache[tid][idx[i]], x[i]])
end
return retVal
end
#########################################
#the comparison....->
cache = CacheLayer()
v7 = [1.0,2.0,3.0,4.0,5.0,6.0,7.0]
cv7 = cache(v7)
idxs = vcat(1,3,5:6)#
scale = [0.7 ,0.3]
#OLD
cRL = CacheRetrieveLayer(cache)
scaleO1 = [scale[1],2.0*scale[1],3.0*scale[1],4.0*scale[1],scale[2],2.0*scale[2],3.0*scale[2],4.0*scale[2]]
gatesO1 = ScaleSum(scaleO1,[[1,5],[2,6],[3,7],[4,8]])
gatesO1(cRL(idxs,v7[idxs]))
#<-scaleO1, gatesO1 with a different scale-pair for each "group"
scaleO2 = [scale[1],scale[1],scale[1],scale[1],scale[2],scale[2],scale[2],scale[2]]
gatesO2 = ScaleSum(scaleO2,[[1,5],[2,6],[3,7],[4,8]])
gatesO2(cRL(idxs,v7[idxs]))
#<-scaleO2, gatesO2 with identical scale-pairs for each "group"
scaleO3 = scale
gatesO3 = ScaleSum(scale,[[1,5],[2,6],[3,7],[4,8]])
#gatesO3(cRL(idxs,v7[idxs]))#<-error
#<-scaleO3 = scale, gatesO3 with one scale-pair for all "group"s. Doesn't work
#NEW: uses as scale input a vector of vectors (but also accepts a single vector for backward compatibility),
# scales each x-vector with the scale, then sums
cRPair = PairWithCacheRetrieve(cache)
scaleN1 =[scale,2.0*scale,3.0*scale,4.0.*scale]
gatesN1 = ScaleSumN(scaleN1)
gatesN1(cRPair(idxs,v7[idxs]))
#<-scaleN1, gatesN1 with a different scale-pair for each 2-Vec in Vec{Vec}
scaleN2 =[scale,2.0*scale,3.0*scale,4.0.*scale]
gatesN2 = ScaleSumN(scaleN2)
gatesN2(cRPair(idxs,v7[idxs]))
#<-scaleN2, gatesN2 with identical scale-pairs for each 2-Vec in Vec{Vec}
scaleN3 = scale
gatesN3 = ScaleSumN(scaleN3)
gatesN3(cRPair(idxs,v7[idxs]))
#<-scaleN3 = scale, gatesN3 with one scale-pair used for each 2-Vec in Vec{Vec}
#compare
@btime gatesO1(cRL(idxs,v7[idxs]))
@btime gatesN1(cRPair(idxs,v7[idxs]))
@btime gatesO2(cRL(idxs,v7[idxs]))
@btime gatesN2(cRPair(idxs,v7[idxs]))
#gatesN3,gatesO3 can't be compared.
#<-you should observe that the new scale sum (together with a machting cacheretrieve as Vec{Vec}) is faster and needs less memory
#old vs new gates for only one pair
#ScaleSum and ScaleSumN have to be newly defined, since a different scale is used
gatesAO1 = ScaleSum(scale,[[1,2]])#<- as possible since v0.10.2
gatesAO2 = ScaleSum(scale) #<-as possible since v0.9.0
gatesAN1 = ScaleSumN(scale) #<- for backward compatibility to the original v0.9.0 version for Vector input when called
gatesAN2 = ScaleSumN([scale]) #<- to make sure it also works for Vec{Vec}-scale (and 1-Vec{Vec} when called)
@btime gatesAO1(cRL(1,v7[1]))
@btime gatesAO2(cRL(1,v7[1]))
@btime gatesAN1(cRPair([1],[v7[1]]))#<- Vec-scale, Vec{Vec}-Argument, works
@btime gatesAN1([v7[1],v7[1]])#<-using a 2-Vector{Float64} works, too
@btime gatesAN2(cRPair([1],[v7[1]]))#<-Vec{Vec}-scale, Vec{Vec}-Argument, works
gatesAN2([v7[1],v7[1]])#<- Vec{Vec}-scale, Vec-Argument, throws an error, since this is an unallowed compbination (a Vec{Vec} scale works ONLY with a Vec{Vec}-x)
#test in a Chain, whether the functors of ScaleSumN are correctly included
net = Chain(cache,
dx->cRPairL(idxs, dx[idxs]),
gatesN1)
Flux.params(net)
#yippieh!
#open:
#1 unclear whether the type of "scale" in the ScaleSumN layer is correct
#2 way to insert error
#3 conditions to test for erroneous state
@ThummeTo, since the new ScaleSum layer is not an extension to existing functionality(, the PairWithCacheRetrieve only makes sense with the new ScaleSum,) and these modifications would involve some modifications in other scripts (the Julia-Con-Tutorial), presumably some tests, I refrain from starting a Pull Request before I hear back from you on this matter.
Dear @juguma, thanks! Can we wait another week or so, because there will be MAJOR changes to FMIFlux.jl (only good ones of course) and many improvements. However I think the proposed layer will work in the next release too without changes.
Dear @ThummeTo, sure we can wait (since the modifications are rather feature than bug). And MAJOR (good) changes are a good reason to wait.
Little teaser: Differentiable event indicators ;-)
Dear @ThummeTo, the findings of today's discussion and yet another try-and-error-session on my side:
collect
in the layer definitions, i.e.
x_proc = collect(sum(x[i].* l.scale[i]) for i in eachindex(x))
in the ScaleSum
and
collect([l.cacheLayer.cache[tid][idx[i]], x[i]] for i in 1:lastindex(idx))
in the PairWithCacheRetrieve
-Layer.
This modification helps in the ScaleSum layer, now Zygote.gradient is happy, but it doesn't in the CacheRetrieve-Layer. I don't know why.function (l::ScaleSumN)(x::AbstractVector{<:Union{Float32,Float64}})
and
function (l::ScaleSumN)(x::Union{AbstractVector{Vector{Float64}},AbstractVector{Vector{Float32}}})
,
but it returned the wrong output for gatesAN1(cRPair([1],[v7[1]]))
([1.4] instead of [1.0], so here I gave up.
I'd prefer to take the if-then-else variant, at least this one was running as expected.Having all that noted down, I am fairly happy to work with my local variant of these layers for now and to let you focus more on the finalization of differentiable event indicators (since that's the clear blocking point for us now, and not the layer cosmetics). Cheers.
This is old and outdated. In principle, this functionality is covered (and was already while filing the ticket, stupid me). There might be potential to make it more effective, but if so, then the title is misleading, and if I come to that conclusion, I'll open a new issue.
The ScaleSum layer is a helpful way to add, e.g. NN-output with FMU-output, however it only works for a Vector of the same length as the scale which has been used in the definition of the ScaleSum layer (in the provided examples, it is 2).
ScaleSum should be able to handle this for more than a Vector. My proposal would be to prepare it for an n-Vector of m-Vectors (where m is the length of the scale vector when defining the layer, typically m=2) and n is the number of inputs that are to be "scale-summed".
Here's a minimal example (including a proposed solution):