Closed cems2 closed 2 years ago
Hello! I can definitely look into this for my first issue, if nobody else wants to do this.
Well it sure seems like a rationale idea to me. But maybe there's some reason why it's done the "wrong" way that I'm not aware of.
On Mar 23, 2020, at 9:35 PM, titus pinta notifications@github.com wrote:
Hello! I can definitely look into this for my first issue, if nobody else wants to do this.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/JuliaDiffEq/DiffEqFlux.jl/issues/111#issuecomment-602994016, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRAR7STJI4EXVIW3NP2VODRJAS6VANCNFSM4KGJ2LSA.
Can't this be done simply by using a function at the last layer of Flux.Chain
?
julia> mov(x) = cat(collect(eachslice(x; dims=3))...; dims=4)
mov (generic function with 1 method)
julia> r = rand(2, 3, 4, 5);
julia> r[:, :, 1, :] == mov(r)[:, :, :, 1]
true
Yes.
Issue: NeuralODE when driven by a batch input will output an array in which the last index is the time index not the batch index. This is different than the standard convention in all other ML libraries including Flux, TensorFlow, Pytorch, and Chainer. In Flux.conv the batch output dimensions are ( Data-dimension, channel index, batch index)
the time series output from a NeuralODE is analagous to the Channel dimension. Thus it would be very logical for the output of a nerualODE to place the time dimension second to the last, so that the batch index can be the last dimension.
Usecase In addition to simply being logical and following the ML convention there are other benefits.
When generting a batch of labels for training using a typical loss function one will generate this batch, often, from the output of some other neuralNet fed the batchinput. That will put the batch index on the end. Alternatively one might generte the batch Labels by hcat-ing a set of individual results together. in either case the batch index is the last dimension.
If the neuralODE had the same dimenstional order then a loss function written for the single label case will also work for the batch label case because the last dimension will broadcast when subtrating the neuralODE batch output from the batch labels.
While you could do a permute on the last two indicies of the neuralODE batch output this is icky for two reasons. First it means the loss function has to detect if it's operating on a batch or a single case and apply the permute in the batch case. Second for a lazy permute, the memory order is not contiguous and so in the event of large tensor sizes inefficient in the memory page retrieval orders or caching.
Suggested Path: add a key word to NeuralODE called "UseMLBatchOrder= true" to set the preferred output. Intially hack this in just with a dimenstion permute. Later on if this turns out to be the preference everyone likes you make the memory storage order concrete in this order and remove the need for the permute.