I've tried to use compile=True in load function, it is super slow to generate results, and I didn't see any GPU utilizition with nvidia-smi like below (a liitle memory ocuupied like 1GB):
after I commented out the code block below, GPU utilization turns normal, and result is generated much much faster:
I cant understand why the torch.compile function didn't work well. Does anyone know why?
The compile only shows its effect when you run a generating process continuously with the same input shape. If you only run it once, it will slow down the generation.
pytorch version: 2.3.1+cu121
I've tried to use
compile=True
inload
function, it is super slow to generate results, and I didn't see any GPU utilizition withnvidia-smi
like below (a liitle memory ocuupied like 1GB):after I commented out the code block below, GPU utilization turns normal, and result is generated much much faster:
I cant understand why the torch.compile function didn't work well. Does anyone know why?