chengzeyi / stable-fast

Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
MIT License
1.15k stars 70 forks source link

RuntimeError: _Map_base::at` #49

Closed imD-5 closed 9 months ago

imD-5 commented 10 months ago

hi, i was trying this out for maximum optimization in aws G5 instance on ubuntu (it's just an nvidia A10g) and i was using comfy ui by calling on the nodes itself in python code and i kept getting this error message that i coudn't solve. how would i be able to resolve this? all the dependencies regarding the pytorch and 'diffusers>=0.19.3' 'xformers>=0.0.20' 'triton>=2.1.0' 'torch>=1.12.0' was met and it worked on my desktop but does not in ubuntu.

/opt/conda/lib/python3.10/site-packages/sfast/utils/flat_tensors.py:157: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! obj_type = tensors[start].item() /opt/conda/lib/python3.10/site-packages/sfast/utils/flat_tensors.py:216: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! size = tensors[start].item() /opt/conda/lib/python3.10/site-packages/sfast/utils/flat_tensors.py:226: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! size = tensors[start].item() /opt/conda/lib/python3.10/site-packages/sfast/utils/flat_tensors.py:212: TracerWarning: Converting a tensor to a Python list might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! return bytes(tensors[start].tolist()), start + 1 /opt/conda/lib/python3.10/site-packages/sfast/utils/flat_tensors.py:203: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! return int(tensors[start].item()), start + 1 0%| | 0/12 [00:01<?, ?it/s] Traceback (most recent call last): File "/home/ubuntu/test/workflow_clip_sdxl2.py", line 320, in main() File "/home/ubuntu/test/workflow_clip_sdxl2.py", line 229, in main ksampler_3 = ksampler.sample( File "/home/ubuntu/ComfyUI/nodes.py", line 1286, in sample return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise) File "/home/ubuntu/ComfyUI/nodes.py", line 1256, in common_ksampler samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, File "/home/ubuntu/ComfyUI/custom_nodes/ComfyUI-Impact-Pack/modules/impact/sample_error_enhancer.py", line 22, in informative_sample raise e File "/home/ubuntu/ComfyUI/custom_nodes/ComfyUI-Impact-Pack/modules/impact/sample_error_enhancer.py", line 9, in informative_sample return original_sample(args, kwargs) File "/home/ubuntu/ComfyUI/comfy/sample.py", line 100, in sample samples = sampler.sample(noise, positive_copy, negative_copy, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed) File "/home/ubuntu/ComfyUI/comfy/samplers.py", line 711, in sample return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed) File "/home/ubuntu/ComfyUI/comfy/samplers.py", line 617, in sample samples = sampler.sample(model_wrap, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar) File "/home/ubuntu/ComfyUI/comfy/samplers.py", line 556, in sample samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, self.extra_options) File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/home/ubuntu/ComfyUI/comfy/k_diffusion/sampling.py", line 137, in sample_euler denoised = model(x, sigma_hat * s_in, *extra_args) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "/home/ubuntu/ComfyUI/comfy/samplers.py", line 277, in forward out = self.inner_model(x, sigma, cond=cond, uncond=uncond, cond_scale=cond_scale, model_options=model_options, seed=seed) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/home/ubuntu/ComfyUI/comfy/samplers.py", line 267, in forward return self.apply_model(args, kwargs) File "/home/ubuntu/ComfyUI/comfy/samplers.py", line 264, in apply_model out = sampling_function(self.inner_model, x, timestep, uncond, cond, cond_scale, model_options=model_options, seed=seed) File "/home/ubuntu/ComfyUI/comfy/samplers.py", line 252, in sampling_function cond, uncond = calc_cond_uncond_batch(model, cond, uncond, x, timestep, model_options) File "/home/ubuntu/ComfyUI/comfy/samplers.py", line 228, in calc_cond_uncond_batch output = model_options['model_function_wrapper'](model.apply_model, {"input": inputx, "timestep": timestep, "c": c, "cond_or_uncond": cond_or_uncond}).chunk(batch_chunks) File "/home/ubuntu/ComfyUI/custom_nodes/ComfyUI_stable_fast/node.py", line 69, in call return self.stable_fast_model.get_traced_module(inputx, timestep, c)[0]( File "/home/ubuntu/ComfyUI/custom_nodes/ComfyUI_stable_fast/module/stable_diffusion_pipeline_compiler.py", line 62, in get_traced_module traced_m, call_helper = trace_with_kwargs( File "/opt/conda/lib/python3.10/site-packages/sfast/jit/trace_helper.py", line 23, in trace_with_kwargs traced_module = better_trace(TraceablePosArgOnlyModuleWrapper(func), File "/opt/conda/lib/python3.10/site-packages/sfast/jit/utils.py", line 29, in better_trace script_module = torch.jit.trace(func, args, kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/jit/_trace.py", line 798, in trace return trace_module( File "/opt/conda/lib/python3.10/site-packages/torch/jit/_trace.py", line 1065, in trace_module module._c._create_method_from_trace( File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1508, in _slow_forward result = self.forward(*input, kwargs) File "/opt/conda/lib/python3.10/site-packages/sfast/jit/trace_helper.py", line 127, in forward outputs = self.module(*orig_args, *orig_kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1508, in _slow_forward result = self.forward(*input, *kwargs) File "/opt/conda/lib/python3.10/site-packages/sfast/jit/trace_helper.py", line 77, in forward return self.func(args, kwargs) File "/home/ubuntu/ComfyUI/comfy/model_base.py", line 68, in apply_model model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, extra_conds).float() File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1508, in _slow_forward result = self.forward(*input, kwargs) File "/home/ubuntu/ComfyUI/comfy/ldm/modules/diffusionmodules/openaimodel.py", line 619, in forward h = forward_timestep_embed(module, h, emb, context, transformer_options) File "/home/ubuntu/ComfyUI/comfy/ldm/modules/diffusionmodules/openaimodel.py", line 35, in forward_timestep_embed x = layer(x, emb) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1508, in _slow_forward result = self.forward(*input, kwargs) File "/home/ubuntu/ComfyUI/comfy/ldm/modules/diffusionmodules/openaimodel.py", line 210, in forward return checkpoint( File "/home/ubuntu/ComfyUI/comfy/ldm/modules/diffusionmodules/util.py", line 121, in checkpoint return CheckpointFunction.apply(func, len(inputs), args) File "/opt/conda/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply return super().apply(args, kwargs) # type: ignore[misc] RuntimeError: _Map_base::at

chengzeyi commented 10 months ago

Can you run `python3 -m torch.utils.collect_env'?

I can't reproduce it.

imD-5 commented 9 months ago

the environment settings i got are as follows:

Collecting environment information... PyTorch version: 2.1.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 Clang version: Could not collect CMake version: version 3.16.3 Libc version: glibc-2.31

Python version: 3.10.9 | packaged by conda-forge | (main, Feb 2 2023, 20:20:04) [GCC 11.3.0] (64-bit runtime) Python platform: Linux-5.15.0-1049-aws-x86_64-with-glibc2.31 Is CUDA available: True CUDA runtime version: 12.1.105 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA A10G Nvidia driver version: 535.104.12 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 48 bits physical, 48 bits virtual CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: AuthenticAMD CPU family: 23 Model: 49 Model name: AMD EPYC 7R32 Stepping: 0 CPU MHz: 2799.998 BogoMIPS: 5599.99 Hypervisor vendor: KVM Virtualization type: full L1d cache: 128 KiB L1i cache: 128 KiB L2 cache: 2 MiB L3 cache: 16 MiB NUMA node0 CPU(s): 0-7 Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Retbleed: Mitigation; untrained return thunk; SMT enabled with STIBP protection Vulnerability Spec rstack overflow: Mitigation; safe RET Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru wbnoinvd arat npt nrip_save rdpid

Versions of relevant libraries: [pip3] numpy==1.26.2 [pip3] stable-fast==0.0.11+torch210cu121 [pip3] torch==2.1.0+cu121 [pip3] torchaudio==2.1.0+cu121 [pip3] torchsde==0.2.6 [pip3] torchvision==0.16.0+cu121 [pip3] triton==2.1.0 [conda] numpy 1.26.2 pypi_0 pypi [conda] stable-fast 0.0.11+torch210cu121 pypi_0 pypi [conda] torch 2.1.0+cu121 pypi_0 pypi [conda] torchaudio 2.1.0+cu121 pypi_0 pypi [conda] torchsde 0.2.6 pypi_0 pypi [conda] torchvision 0.16.0+cu121 pypi_0 pypi [conda] triton 2.1.0 pypi_0 pypi

chengzeyi commented 9 months ago

Looks like your environment is OK. Can you install the latest stable-fast and retry? Also can you share the model you use?

imD-5 commented 9 months ago

thanks for taking your time on this ! i have downloaded and installed the latest version, and i still get the same error. "stable_fast-0.0.12.post3+torch210cu121-cp310-cp310-manylinux2014_x86_64.whl" the model i have tried is the two below. Do fp16models not work? with the nature of your optimization i dont think the data type makes a difference though. https://huggingface.co/gsdf/Counterfeit-V3.0/blob/main/Counterfeit-V3.0_fix_fp16.safetensors https://huggingface.co/Lykon/AnyLoRA/blob/main/AnyLoRA_noVae_fp16.safetensors

imD-5 commented 9 months ago

To give more context, it might be a problem originating from directly using the node classes in python code for execution. For this, i used the extension "https://github.com/pydn/ComfyUI-to-Python-Extension.git" as baseline for building my code. From an optimization point of view, it reduces the additional overhead caused by the setup on internal server, but still preserves the customizability and ease of use that comfyui provides. For example, the usage of stable fast in my implementation looks like this. are there any other imports or arguments I have to add to make this work?

applystablefastunet = NODE_CLASS_MAPPINGS["ApplyStableFastUnet"]() applystablefastunet_80 = applystablefastunet.apply_stable_fast( enable_cuda_graph=True, model=get_value_at_index(loraloader_58, 0) )

chengzeyi commented 9 months ago

I can't figure it out😥

Help is needed.

imD-5 commented 9 months ago

ok after a lot of trying I solved the problem. it was the checkpoint loader i was using. the stable fast does not work with "checkpointLoader" node, but only works with the "CheckpointLoaderSimple" node.

chengzeyi commented 9 months ago

ok after a lot of trying I solved the problem. it was the checkpoint loader i was using. the stable fast does not work with "checkpointLoader" node, but only works with the "CheckpointLoaderSimple" node.

ComfyUI is really complex. best experience is with pure huggingface's diffusers.

gameltb commented 9 months ago

You can use checkpointLoader node with a configuration file with property model.params.unet_config.params.use_checkpoint set to False.

chengzeyi commented 9 months ago

@imD-5 @gameltb checkpoint feature in pytorch could be incompatible with many optimization solutions

imD-5 commented 9 months ago

yeah i thought that as well so i opted to use diffuses format for my project. i eventually got it down to 3sec/gen of 1024x1024 images less than 1 sec for 514! i appreceate you work very much thanks!