I've updated the code so it is compatible with the StableVideoDiffusionPipeline. It handles the 5 dimensional input and applies token merging to the temporal attention.
Temporal attention is not a major performance bottleneck but there is a lot of redundancy and I wanted to see if it would work.
I've updated the code so it is compatible with the
StableVideoDiffusionPipeline
. It handles the 5 dimensional input and applies token merging to the temporal attention.Temporal attention is not a major performance bottleneck but there is a lot of redundancy and I wanted to see if it would work.
I have a repo for testing it here: https://github.com/jfischoff/svd-tomesd