Closed ImahnShekhzadeh closed 10 months ago
Hello,
You are right. It would be indeed more efficient to create tensors this way. I do not mind the current implementation as the purpose of tutorials is to show how we can train flow matching methods. I will check for cifar10 training but that will have to wait a bit.
train_cifar10.py
is indirectly affected by this, since utils_cifar.py
has some CPU-GPU overhead, e.g. in L37 in the generate_samples()
function. Some other Python scripts are affected by this as well.
I am actually currently working on fixing this, I can open a draft PR once I am finished.
That's great, yep a simple improvement. I was never too worried about this, as I was pretty sure that it didn't matter with most of the time in the Unet. Would be very curious if you notice a speedup! Regardless slightly better and welcome the PR.
I performed some tests by running train_cifar10.py
, on a 4090, I did not notice a speedup (some minor fluctuations only). I will open a PR shortly.
Hi!
I noticed that in several places of the code, e.g. in the
conditional_mnist.ipynb
tutorial, there's a slight CPU-GPU overhead. For example, the linefirst creates the tensor on the CPU and then copies it onto the GPU (in case
device
iscuda:0
).It is possible to directly create the tensor on the specified device via
which is more memory-efficient, since there's no CPU-GPU overhead.