Closed mspronesti closed 1 year ago
Thanks, out of curiosity what GPU do you end up using? How fast does the process run there compared to your cpu?
Thanks for merging! I used the environment I set up here #46. Compared to my CPU, it is around 45x faster, ignoring the time of the first compilation. I repeated the experiment 10 times using the DDIM scheduler with 50 inference steps:
Hardware | avg inference time |
---|---|
Intel Xeon CPU @2.20 GHz | 24 min |
NVIDIA T4 Tensor Core GPU | 32.4 s |
As for the python version, it seems comparable. I was wondering a couple of days ago whether working with Vec<_>
s instead of tch::Tensor
s slows down a little the inference process (even if we only access them, we don't perform operations with timesteps or sigmas and when we do we access data that we combine with Tensors
)
Updated the above reply with some numbers and some more details on the experimental setup.
I also noticed all the checks failed when you merged this pull request. I'm a little surprised, because they all succeeded when I opened it. I mirrored the repo and re-run all the jobs. After the second attempt, 6 more passed. On the third re-run, they all passed. Perhaps some recent release of the actions broke something ?
Now that I have the chance to use the GPU to run diffusion experiments I noticed one of the schedulers I implemented (
DDPMScheduler
) performs operations on tensors on different devices. This PR fixes it. I double checked all the other schedulers are sound from this point of view.