Open tigran123 opened 5 days ago
Oh, I see, maybe it is because my torch is version 2.3.1, but should be 2.4...
Upgraded torch to 2.4.1 and now it starts up correctly, BUT then fails with this error:
loading from pulid_ca
loading from pulid_encoder
Running on local URL: http://0.0.0.0:8080
To create a public link, set `share=True` in `launch()`.
IMPORTANT: You are using gradio version 4.19.1, however version 5.0.1 is available, please upgrade.
--------
Generating 'A photo of beautiful white young woman dancing on the beach with blue aliens.' with seed 15491887865678354950
/home/tigran/.local/lib/python3.12/site-packages/insightface/utils/transform.py:68: FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`.
P = np.linalg.lstsq(X_homo, Y)[0].T # Affine matrix. 3 x 4
Traceback (most recent call last):
File "/home/tigran/.local/lib/python3.12/site-packages/gradio/queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/.local/lib/python3.12/site-packages/gradio/route_utils.py", line 233, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/.local/lib/python3.12/site-packages/gradio/blocks.py", line 1608, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/.local/lib/python3.12/site-packages/gradio/blocks.py", line 1176, in call_function
prediction = await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/.local/lib/python3.12/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/.local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/home/tigran/.local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 851, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/.local/lib/python3.12/site-packages/gradio/utils.py", line 689, in wrapper
response = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/tigran/.local/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/python/ml/PuLID/app_flux.py", line 121, in generate_image
id_embeddings, uncond_id_embeddings = self.pulid_model.get_id_embedding(id_image, cal_uncond=use_true_cfg)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/.local/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/python/ml/PuLID/pulid/pipeline_flux.py", line 175, in get_id_embedding
id_cond_vit, id_vit_hidden = self.clip_vision_model(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/python/ml/PuLID/eva_clip/eva_vit_model.py", line 544, in forward
x, hidden_states = self.forward_features(x, return_all_features, return_hidden, shuffle)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/python/ml/PuLID/eva_clip/eva_vit_model.py", line 531, in forward_features
x = blk(x, rel_pos_bias=rel_pos_bias)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/python/ml/PuLID/eva_clip/eva_vit_model.py", line 293, in forward
x = x + self.drop_path(self.attn(self.norm1(x), rel_pos_bias=rel_pos_bias, attn_mask=attn_mask))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/python/ml/PuLID/eva_clip/eva_vit_model.py", line 208, in forward
x = xops.memory_efficient_attention(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/.local/lib/python3.12/site-packages/xformers/ops/fmha/__init__.py", line 276, in memory_efficient_attention
return _memory_efficient_attention(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/.local/lib/python3.12/site-packages/xformers/ops/fmha/__init__.py", line 395, in _memory_efficient_attention
return _memory_efficient_attention_forward(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/.local/lib/python3.12/site-packages/xformers/ops/fmha/__init__.py", line 414, in _memory_efficient_attention_forward
op = _dispatch_fw(inp, False)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/.local/lib/python3.12/site-packages/xformers/ops/fmha/dispatch.py", line 119, in _dispatch_fw
return _run_priority_list(
^^^^^^^^^^^^^^^^^^^
File "/home/tigran/.local/lib/python3.12/site-packages/xformers/ops/fmha/dispatch.py", line 55, in _run_priority_list
raise NotImplementedError(msg)
NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
query : shape=(1, 577, 16, 64) (torch.bfloat16)
key : shape=(1, 577, 16, 64) (torch.bfloat16)
value : shape=(1, 577, 16, 64) (torch.bfloat16)
attn_bias : <class 'NoneType'>
p : 0.0
`decoderF` is not supported because:
xFormers wasn't build with CUDA support
attn_bias type is <class 'NoneType'>
operator wasn't built - see `python -m xformers.info` for more info
`flshattF@0.0.0` is not supported because:
xFormers wasn't build with CUDA support
`cutlassF` is not supported because:
xFormers wasn't build with CUDA support
operator wasn't built - see `python -m xformers.info` for more info
`smallkF` is not supported because:
max(query.shape[-1] != value.shape[-1]) > 32
xFormers wasn't build with CUDA support
dtype=torch.bfloat16 (supported: {torch.float32})
has custom scale
operator wasn't built - see `python -m xformers.info` for more info
unsupported embed per head: 64
So, let's look at xformers.info:
$ python3.12 -m xformers.info
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.3.1+cu121 with CUDA 1201 (you have 2.4.1+cu121)
Python 3.12.4 (you have 3.12.3)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details
/home/tigran/.local/lib/python3.12/site-packages/xformers/ops/fmha/flash.py:211: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
@torch.library.impl_abstract("xformers_flash::flash_fwd")
/home/tigran/.local/lib/python3.12/site-packages/xformers/ops/fmha/flash.py:344: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
@torch.library.impl_abstract("xformers_flash::flash_bwd")
/home/tigran/.local/lib/python3.12/site-packages/xformers/ops/swiglu_op.py:127: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
@torch.cuda.amp.custom_fwd
/home/tigran/.local/lib/python3.12/site-packages/xformers/ops/swiglu_op.py:148: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
@torch.cuda.amp.custom_bwd
xFormers 0.0.27
memory_efficient_attention.ckF: unavailable
memory_efficient_attention.ckB: unavailable
memory_efficient_attention.ck_decoderF: unavailable
memory_efficient_attention.ck_splitKF: unavailable
memory_efficient_attention.cutlassF: unavailable
memory_efficient_attention.cutlassB: unavailable
memory_efficient_attention.decoderF: unavailable
memory_efficient_attention.flshattF@0.0.0: available
memory_efficient_attention.flshattB@0.0.0: available
memory_efficient_attention.smallkF: unavailable
memory_efficient_attention.smallkB: unavailable
memory_efficient_attention.triton_splitKF: available
indexing.scaled_index_addF: available
indexing.scaled_index_addB: available
indexing.index_select: available
sequence_parallel_fused.write_values: unavailable
sequence_parallel_fused.wait_values: unavailable
sequence_parallel_fused.cuda_memset_32b_async: unavailable
sp24.sparse24_sparsify_both_ways: unavailable
sp24.sparse24_apply: unavailable
sp24.sparse24_apply_dense_output: unavailable
sp24._sparse24_gemm: unavailable
sp24._cslt_sparse_mm@0.5.2: available
swiglu.dual_gemm_silu: unavailable
swiglu.gemm_fused_operand_sum: unavailable
swiglu.fused.p.cpp: not built
is_triton_available: True
pytorch.version: 2.4.1+cu121
pytorch.cuda: available
gpu.compute_capability: 8.6
gpu.name: NVIDIA GeForce RTX 3090
dcgm_profiler: unavailable
build.info: available
build.cuda_version: 1201
build.hip_version: None
build.python_version: 3.12.4
build.torch_version: 2.3.1+cu121
build.env.TORCH_CUDA_ARCH_LIST: 6.0+PTX 7.0 7.5 8.0+PTX
build.env.PYTORCH_ROCM_ARCH: None
build.env.XFORMERS_BUILD_TYPE: Release
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None
build.env.NVCC_FLAGS: None
build.env.XFORMERS_PACKAGE_FROM: wheel-v0.0.27
source.privacy: open source
Is there some special way to install transformers module?
Ok, installed xformers with pip3.12 install -U xformers --index-url https://download.pytorch.org/whl/cu124
and now it fails with CUDA out of memory:
Generating 'A photo of beautiful white young woman sitting on the bed in the bedroom and smiling.' with seed 12115162621655045783
/home/tigran/.local/lib/python3.12/site-packages/insightface/utils/transform.py:68: FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`.
P = np.linalg.lstsq(X_homo, Y)[0].T # Affine matrix. 3 x 4
Traceback (most recent call last):
File "/home/tigran/.local/lib/python3.12/site-packages/gradio/queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/.local/lib/python3.12/site-packages/gradio/route_utils.py", line 233, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/.local/lib/python3.12/site-packages/gradio/blocks.py", line 1608, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/.local/lib/python3.12/site-packages/gradio/blocks.py", line 1176, in call_function
prediction = await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/.local/lib/python3.12/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/.local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/home/tigran/.local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 851, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/.local/lib/python3.12/site-packages/gradio/utils.py", line 689, in wrapper
response = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/tigran/.local/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/python/ml/PuLID/app_flux.py", line 133, in generate_image
self.model = self.model.to(self.device)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1174, in to
return self._apply(convert)
^^^^^^^^^^^^^^^^^^^^
File "/home/tigran/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 780, in _apply
module._apply(fn)
File "/home/tigran/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 780, in _apply
module._apply(fn)
File "/home/tigran/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 780, in _apply
module._apply(fn)
File "/home/tigran/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 805, in _apply
param_applied = fn(param)
^^^^^^^^^
File "/home/tigran/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1160, in convert
return t.to(
^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 126.00 MiB. GPU 0 has a total capacity of 23.60 GiB of which 98.12 MiB is free. Including non-PyTorch memory, this process has 23.49 GiB memory in use. Of the allocated memory 21.62 GiB is allocated by PyTorch, and 6.06 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
But I have 35GB VRAM spread between 11GB on RTX 2080 TI and 24GB on RTX 3090. And I started it as recommended in the docs with python3.12 app_flux.py --offload
so it should peak at 17GB and fit even into 24GB of RTX 3090. Very strange.
UPDATE: Ah, I forgot --fp8
-- let me try with that and see...
Ok, with --fp8
switch it works, but the quality is a bit below than what I expected (compared with presumably the bf16 version running on huggingfaces). Ok, I'll try the --aggressive-offload
option.
UPDATE: Yes, --aggressive-offload
works, but is 6 times slower.
Anyway, everything seems to work as documented. My only question is: is there no way to use multiple GPUs to take advantage of the extra 11GB VRAM I have on the RTX 2080 Ti?
Trying to run
app_flux.py
results in the following error: