Closed crcrpar closed 3 months ago
I think this reordering happens because in general fusing passes do topological sorting of operations grouping things together and in practice, I don't think this buffer tensor lives long in the program. But it needs to be checked what actually happens.
functionalize_inplace_ops
puts all the required prims.copy_
's right before the return stmt. Luca and I were thinking about moving each copy right after the last consumption of its destination
Maybe we can open an issue for this. Also let's add an issue with the proposal for batchnorm.
Originally posted by @lantiga in https://github.com/Lightning-AI/lightning-thunder/pull/675#pullrequestreview-2147432669