Open willtebbutt opened 5 months ago
Note: at the time of writing, there are roughly 80 elements in shared data for the _kron!
function, and I believe that there should only really be 3.
Let’s consider introducing a benchmark/test for each optimisation candidate above. It is much easier to work with the codebase if the performance implications of new PRs can be clearly understood and pinned down to a specific use case.
This issue will remain open on an on-going basis. Items should be added as they are discovered.
Examples which allocate that should not, but which currently do:
(false, :allocs, nothing, TestResources.kron!, randn(400, 400), Diagonal(randn(20)), randn(20, 20))
(to do withincrement!!
allocating for non isbits types contained inPossiblyUninitTangent
)(false, :stability, nothing, lsetfield!, TestResources.FullyInitMutableStruct(5.0, [1.0, 2.0]), Val(:y), [1.0, 3.0, 4.0])
(to do with increment implementation forPossiblyUninitTangent
.TestResources._sum
is okay, but not fantastic. Ought to be improvable.IDGoToNode
at the end of the way back.ReturnNode
. Also don't push / pop the block stack for the entry block in this case -- this will completely remove usage of the block stack in single-block code.call
s /:invoke
s whose pullback is provablyNoPullback
.QuoteNode
s inPhiNode
s should have theirAugmentedRegister
s appear inline asQuoteNode
s, rather than living in shared storage. Currently they tend to take up quite a bit of storage.NoTangent
, but aren't.