ToTheBeginning / PuLID

[NeurIPS 2024] Official code for PuLID: Pure and Lightning ID Customization via Contrastive Alignment
Apache License 2.0
2.31k stars 157 forks source link

AttributeError: module 'torch.library' has no attribute 'register_fake' #119

Open tigran123 opened 5 days ago

tigran123 commented 5 days ago

Trying to run app_flux.py results in the following error:

$ python3.12 app_flux.py --offload --fp8
INFO:albumentations.check_version:A new version of Albumentations is available: 1.4.18 (you have 1.4.11). Upgrade using: pip install --upgrade albumentations
Traceback (most recent call last):
  File "/home/tigran/python/ml/PuLID/app_flux.py", line 17, in <module>
    from pulid.pipeline_flux import PuLIDPipeline
  File "/home/tigran/python/ml/PuLID/pulid/pipeline_flux.py", line 7, in <module>
    from facexlib.parsing import init_parsing_model
  File "/home/tigran/.local/lib/python3.12/site-packages/facexlib/__init__.py", line 3, in <module>
    from .detection import *
  File "/home/tigran/.local/lib/python3.12/site-packages/facexlib/detection/__init__.py", line 5, in <module>
    from .retinaface import RetinaFace
  File "/home/tigran/.local/lib/python3.12/site-packages/facexlib/detection/retinaface.py", line 7, in <module>
    from torchvision.models._utils import IntermediateLayerGetter as IntermediateLayerGetter
  File "/home/tigran/.local/lib/python3.12/site-packages/torchvision/__init__.py", line 10, in <module>
    from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils  # usort:skip
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/torchvision/_meta_registrations.py", line 163, in <module>
    @torch.library.register_fake("torchvision::nms")
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'torch.library' has no attribute 'register_fake'

$ nvidia-smi 
Sat Oct 12 22:01:18 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.06              Driver Version: 555.42.06      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 2080 Ti     Off |   00000000:01:00.0  On |                  N/A |
| 27%   55C    P0             64W /  250W |     380MiB /  11264MiB |      1%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 3090        Off |   00000000:02:00.0 Off |                  N/A |
|  0%   37C    P8             19W /  350W |      15MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      1083      G   /usr/lib/xorg/Xorg                            271MiB |
|    0   N/A  N/A      1229      G   /usr/bin/gnome-shell                           40MiB |
|    1   N/A  N/A      1083      G   /usr/lib/xorg/Xorg                              4MiB |
+-----------------------------------------------------------------------------------------+

$ pip3.12 list | grep torch
torch                      2.3.1
torchaudio                 2.3.1
torchvision                0.19.1
tigran123 commented 5 days ago

Oh, I see, maybe it is because my torch is version 2.3.1, but should be 2.4...

tigran123 commented 5 days ago

Upgraded torch to 2.4.1 and now it starts up correctly, BUT then fails with this error:

loading from pulid_ca
loading from pulid_encoder
Running on local URL:  http://0.0.0.0:8080

To create a public link, set `share=True` in `launch()`.
IMPORTANT: You are using gradio version 4.19.1, however version 5.0.1 is available, please upgrade.
--------
Generating 'A photo of beautiful white young woman dancing on the beach with blue aliens.' with seed 15491887865678354950
/home/tigran/.local/lib/python3.12/site-packages/insightface/utils/transform.py:68: FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`.
  P = np.linalg.lstsq(X_homo, Y)[0].T # Affine matrix. 3 x 4
Traceback (most recent call last):
  File "/home/tigran/.local/lib/python3.12/site-packages/gradio/queueing.py", line 495, in call_prediction
    output = await route_utils.call_process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/gradio/route_utils.py", line 233, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/gradio/blocks.py", line 1608, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/gradio/blocks.py", line 1176, in call_function
    prediction = await anyio.to_thread.run_sync(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 851, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/gradio/utils.py", line 689, in wrapper
    response = f(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/python/ml/PuLID/app_flux.py", line 121, in generate_image
    id_embeddings, uncond_id_embeddings = self.pulid_model.get_id_embedding(id_image, cal_uncond=use_true_cfg)
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/python/ml/PuLID/pulid/pipeline_flux.py", line 175, in get_id_embedding
    id_cond_vit, id_vit_hidden = self.clip_vision_model(
                                 ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/python/ml/PuLID/eva_clip/eva_vit_model.py", line 544, in forward
    x, hidden_states = self.forward_features(x, return_all_features, return_hidden, shuffle)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/python/ml/PuLID/eva_clip/eva_vit_model.py", line 531, in forward_features
    x = blk(x, rel_pos_bias=rel_pos_bias)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/python/ml/PuLID/eva_clip/eva_vit_model.py", line 293, in forward
    x = x + self.drop_path(self.attn(self.norm1(x), rel_pos_bias=rel_pos_bias, attn_mask=attn_mask))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/python/ml/PuLID/eva_clip/eva_vit_model.py", line 208, in forward
    x = xops.memory_efficient_attention(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/xformers/ops/fmha/__init__.py", line 276, in memory_efficient_attention
    return _memory_efficient_attention(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/xformers/ops/fmha/__init__.py", line 395, in _memory_efficient_attention
    return _memory_efficient_attention_forward(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/xformers/ops/fmha/__init__.py", line 414, in _memory_efficient_attention_forward
    op = _dispatch_fw(inp, False)
         ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/xformers/ops/fmha/dispatch.py", line 119, in _dispatch_fw
    return _run_priority_list(
           ^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/xformers/ops/fmha/dispatch.py", line 55, in _run_priority_list
    raise NotImplementedError(msg)
NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
     query       : shape=(1, 577, 16, 64) (torch.bfloat16)
     key         : shape=(1, 577, 16, 64) (torch.bfloat16)
     value       : shape=(1, 577, 16, 64) (torch.bfloat16)
     attn_bias   : <class 'NoneType'>
     p           : 0.0
`decoderF` is not supported because:
    xFormers wasn't build with CUDA support
    attn_bias type is <class 'NoneType'>
    operator wasn't built - see `python -m xformers.info` for more info
`flshattF@0.0.0` is not supported because:
    xFormers wasn't build with CUDA support
`cutlassF` is not supported because:
    xFormers wasn't build with CUDA support
    operator wasn't built - see `python -m xformers.info` for more info
`smallkF` is not supported because:
    max(query.shape[-1] != value.shape[-1]) > 32
    xFormers wasn't build with CUDA support
    dtype=torch.bfloat16 (supported: {torch.float32})
    has custom scale
    operator wasn't built - see `python -m xformers.info` for more info
    unsupported embed per head: 64

So, let's look at xformers.info:

$ python3.12 -m xformers.info
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.3.1+cu121 with CUDA 1201 (you have 2.4.1+cu121)
    Python  3.12.4 (you have 3.12.3)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details
/home/tigran/.local/lib/python3.12/site-packages/xformers/ops/fmha/flash.py:211: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @torch.library.impl_abstract("xformers_flash::flash_fwd")
/home/tigran/.local/lib/python3.12/site-packages/xformers/ops/fmha/flash.py:344: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @torch.library.impl_abstract("xformers_flash::flash_bwd")
/home/tigran/.local/lib/python3.12/site-packages/xformers/ops/swiglu_op.py:127: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @torch.cuda.amp.custom_fwd
/home/tigran/.local/lib/python3.12/site-packages/xformers/ops/swiglu_op.py:148: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  @torch.cuda.amp.custom_bwd
xFormers 0.0.27
memory_efficient_attention.ckF:                    unavailable
memory_efficient_attention.ckB:                    unavailable
memory_efficient_attention.ck_decoderF:            unavailable
memory_efficient_attention.ck_splitKF:             unavailable
memory_efficient_attention.cutlassF:               unavailable
memory_efficient_attention.cutlassB:               unavailable
memory_efficient_attention.decoderF:               unavailable
memory_efficient_attention.flshattF@0.0.0:         available
memory_efficient_attention.flshattB@0.0.0:         available
memory_efficient_attention.smallkF:                unavailable
memory_efficient_attention.smallkB:                unavailable
memory_efficient_attention.triton_splitKF:         available
indexing.scaled_index_addF:                        available
indexing.scaled_index_addB:                        available
indexing.index_select:                             available
sequence_parallel_fused.write_values:              unavailable
sequence_parallel_fused.wait_values:               unavailable
sequence_parallel_fused.cuda_memset_32b_async:     unavailable
sp24.sparse24_sparsify_both_ways:                  unavailable
sp24.sparse24_apply:                               unavailable
sp24.sparse24_apply_dense_output:                  unavailable
sp24._sparse24_gemm:                               unavailable
sp24._cslt_sparse_mm@0.5.2:                        available
swiglu.dual_gemm_silu:                             unavailable
swiglu.gemm_fused_operand_sum:                     unavailable
swiglu.fused.p.cpp:                                not built
is_triton_available:                               True
pytorch.version:                                   2.4.1+cu121
pytorch.cuda:                                      available
gpu.compute_capability:                            8.6
gpu.name:                                          NVIDIA GeForce RTX 3090
dcgm_profiler:                                     unavailable
build.info:                                        available
build.cuda_version:                                1201
build.hip_version:                                 None
build.python_version:                              3.12.4
build.torch_version:                               2.3.1+cu121
build.env.TORCH_CUDA_ARCH_LIST:                    6.0+PTX 7.0 7.5 8.0+PTX
build.env.PYTORCH_ROCM_ARCH:                       None
build.env.XFORMERS_BUILD_TYPE:                     Release
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS:        None
build.env.NVCC_FLAGS:                              None
build.env.XFORMERS_PACKAGE_FROM:                   wheel-v0.0.27
source.privacy:                                    open source

Is there some special way to install transformers module?

tigran123 commented 5 days ago

Ok, installed xformers with pip3.12 install -U xformers --index-url https://download.pytorch.org/whl/cu124 and now it fails with CUDA out of memory:

Generating 'A photo of beautiful white young woman sitting on the bed in the bedroom and smiling.' with seed 12115162621655045783
/home/tigran/.local/lib/python3.12/site-packages/insightface/utils/transform.py:68: FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`.
  P = np.linalg.lstsq(X_homo, Y)[0].T # Affine matrix. 3 x 4
Traceback (most recent call last):
  File "/home/tigran/.local/lib/python3.12/site-packages/gradio/queueing.py", line 495, in call_prediction
    output = await route_utils.call_process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/gradio/route_utils.py", line 233, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/gradio/blocks.py", line 1608, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/gradio/blocks.py", line 1176, in call_function
    prediction = await anyio.to_thread.run_sync(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 851, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/gradio/utils.py", line 689, in wrapper
    response = f(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/python/ml/PuLID/app_flux.py", line 133, in generate_image
    self.model = self.model.to(self.device)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1174, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 780, in _apply
    module._apply(fn)
  File "/home/tigran/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 780, in _apply
    module._apply(fn)
  File "/home/tigran/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 780, in _apply
    module._apply(fn)
  File "/home/tigran/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 805, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "/home/tigran/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1160, in convert
    return t.to(
           ^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 126.00 MiB. GPU 0 has a total capacity of 23.60 GiB of which 98.12 MiB is free. Including non-PyTorch memory, this process has 23.49 GiB memory in use. Of the allocated memory 21.62 GiB is allocated by PyTorch, and 6.06 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

But I have 35GB VRAM spread between 11GB on RTX 2080 TI and 24GB on RTX 3090. And I started it as recommended in the docs with python3.12 app_flux.py --offload so it should peak at 17GB and fit even into 24GB of RTX 3090. Very strange.

UPDATE: Ah, I forgot --fp8 -- let me try with that and see...

tigran123 commented 4 days ago

Ok, with --fp8 switch it works, but the quality is a bit below than what I expected (compared with presumably the bf16 version running on huggingfaces). Ok, I'll try the --aggressive-offload option.

UPDATE: Yes, --aggressive-offload works, but is 6 times slower.

Anyway, everything seems to work as documented. My only question is: is there no way to use multiple GPUs to take advantage of the extra 11GB VRAM I have on the RTX 2080 Ti?