Open 631068264 opened 3 weeks ago
almost same params
Even use int8 , it can't save more memory and slower than use deepcache. Is this supposed to be ? How to save more memory?
TensorRT supports dynamic-shape but why max_batch_size is 4 ?
python3 demo_txt2img_xl.py "An astronaut riding a green horse" \ --version=xl-1.0 \ --framework-model-dir /xxx/stable-diffusion-xl-base-1.0 \ --build-dynamic-shape \ --timing-cache /xxx/stable-diffusion-xl-base-1.0/timing-cache \ --engine-dir /xxx/trt_engine \ --onnx-dir /xxx/onnx \ --num-warmup-runs 1 \ --int8 \ # optional -v \ --onnx-opset 17 \ --height 1024 \ --width 1024 \ --batch-size 4 \ --denoising-steps 50
Use diffusers only
def deep_cache(pipe): # https://arxiv.org/abs/2312.00858 from DeepCache import DeepCacheSDHelper helper = DeepCacheSDHelper(pipe=pipe) helper.set_params( cache_interval=3, cache_branch_id=0, ) helper.enable() normal_optimization(pipe) def normal_optimization(pipe): pipe.enable_xformers_memory_efficient_attention() # pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True) pipe.enable_vae_slicing() pipe.enable_vae_tiling() pipe.enable_model_cpu_offload() def load_from_single(local_dir): pipe = StableDiffusionXLPipeline.from_single_file( f'{local_dir}/sd_xl_base_1.0.safetensors', config=download_config(local_dir), local_files_only=True, torch_dtype=torch.float16, ).to("cuda") prompt = ["An astronaut riding a green horse"] * 5 # images = tgate_with_dc(pipe, prompt) deep_cache(pipe) images = pipe(prompt=prompt).images save_image(images) load_from_single(local_dir)
TensorRT Version: 10.1
NVIDIA GPU: A100 40G
NVIDIA Driver Version: 555.42.02
CUDA Version: 12.5
CUDNN Version:
Operating System:
Python Version (if applicable): 3.11
Tensorflow Version (if applicable):
PyTorch Version (if applicable): 2.3
Baremetal or Container (if so, version):
Hello !! Can someone answer my question?
Description
almost same params
Even use int8 , it can't save more memory and slower than use deepcache. Is this supposed to be ? How to save more memory?
TensorRT supports dynamic-shape but why max_batch_size is 4 ?
Use diffusers only
Environment
TensorRT Version: 10.1
NVIDIA GPU: A100 40G
NVIDIA Driver Version: 555.42.02
CUDA Version: 12.5
CUDNN Version:
Operating System:
Python Version (if applicable): 3.11
Tensorflow Version (if applicable):
PyTorch Version (if applicable): 2.3
Baremetal or Container (if so, version):