Closed trolley813 closed 10 months ago
Hello @trolley813 ,
Thanks a lot for your feedback.
I can confirm and reproduce the issue.
The solution (which worked for me) is to open the relevant files in UTF-8 mode, but I'm unsure that there's no other places
You are right. This error has already been patched for the biniou console few weeks ago, but I completely missed the other impacted functions. As you suspect, reading and writing settings could potentially reproduce this behavior and have been modified too.
65dd31b should solve the issue you encountered.
I close the issue, but don't hesitate to re-open if needed.
Thanks again for your support !
Thanks! Yes, 65dd31b indeed fixed the problem, it works great!
P.S. Running on HDD also works great. Although initial loading of the model can take a while (tens of seconds), but further requests on the same model work much faster.
Thanks for the feedback.
I suspect that the combination of HDD + low RAM -which is not your case- will be a mess ("there will be swap").
As I see you have a powerful Nvidia GPU, could you also precise if you use the CUDA optimization of biniou or only the CPU base version ?
And if using CUDA, is it really accelerating inferences on heavy modules like images or videos ones ?
As I see you have a powerful Nvidia GPU, could you also precise if you use the CUDA optimization of biniou or only the CPU base version ?
Yes, I've tried both versions. However, the Bark module (text-to-speech) does not work with CUDA enabled (it complains something about that both CPU and CUDA are available, and one should specify which device to use - sorry, I don't remember the exact error message).
And if using CUDA, is it really accelerating inferences on heavy modules like images or videos ones ?
Yes, it does. At least for images. P.S. For Kandinsky 2, you can split the generation process between CPU and GPU, like I did here. Sorry, I did not read your code much, so I'm unsure if that's applicable to here (and if you're not already using this way). But it can help to achieve better results faster (i.e. on my 8GB GPU it's possible to draw 1024x1024 images and maybe somewhat larger (I've also tried rectangular 768x1536 ones) with reasonable speed (about 1 minute per image)).
Great news. Thanks a lot for your confirmation ! I wasn't sure at all that it will be usable with the modifications I made for CUDA compatibility.
Yes, I've tried both versions. However, the Bark module (text-to-speech) does not work with CUDA enabled (it complains something about that both CPU and CUDA are available, and one should specify which device to use - sorry, I don't remember the exact error message).
Don't mind. I will try to fix this issue asap. Edit : 2a1efd6 may fix the bark issue.
Yes, it does. At least for images. P.S. For Kandinsky 2, you can split the generation process between CPU and GPU, like I did https://github.com/ai-forever/Kandinsky-2/issues/79. Sorry, I did not read your code much, so I'm unsure if that's applicable to here (and if you're not already using this way). But it can help to achieve better results faster (i.e. on my 8GB GPU it's possible to draw 1024x1024 images and maybe somewhat larger (I've also tried rectangular 768x1536 ones) with reasonable speed (about 1 minute per image)).
Awesome and very smart ! I didn't suspect this can be done this way. Thus, i'm not sure it could be ported to the Kandinsky module, but i will give it a try.
FYI (as you are a Kandinsky user), and if you've missed it, Kandinsky 3.0 should be usable by uncommenting line 29 in ressources/txt2img_kd
:
# "kandinsky-community/kandinsky-3",
This will make Kandinsky 3.0 the default model for Kandinsky module, but it doesn't seems to be usable with cpu-only.
Thanks again for your comments and feedbacks, they are very important for the project.
FYI (as you are a Kandinsky user), and if you've missed it, Kandinsky 3.0 should be usable by uncommenting line 29 in
ressources/txt2img_kd
:# "kandinsky-community/kandinsky-3"
,This will make Kandinsky 3.0 the default model for Kandinsky module, but it doesn't seems to be usable with cpu-only.
Thank you! Yes, I still did not succeed to run Kandinsky 3.0 on my PC, since it (likely) does not fit in 8GB VRAM, and attempting to run on CPU (not here, just with an example script) also showed an fp16-related error (CPU does not support float16's, and they are probably hardcoded somewhere, since the error persists when overriding fp16=False
).
Thanks again for your comments and feedbacks, they are very important for the project.
I'm very glad to help!
~I still did not succeed to run Kandinsky 3.0 on my PC, since it (likely) does not fit in 8GB VRAM, and attempting to run on CPU (not here, just with an example script) also showed an fp16-related error (CPU does not support float16's, and they are probably hardcoded somewhere, since the error persists when overriding fp16=False).~
Well, I was all wrong. It works without CUDA by enforcing torch_dtype=torch.float32
. However, it takes LOTS of RAM (~60GB, so some swapping can be expected with 64) and thus works somewhat slowly (~6 minutes per 1024x1024 image on 16-core/32-thread CPU). Maybe playing with device_map
(I suspect that it acts similarly to ai-forever/Kandinsky-2#79) and/or using a different (smaller) text encoder will accelerate the generation process.
Well, I was all wrong. It works without CUDA by enforcing torch_dtype=torch.float32.
It is enforced when you use a cpu-only inference in biniou. I don't get it either ... Maybe it require a CUDA compatible PyTorch and not the cpu-only version even if it only use the CPU ? I will give it a try in the week-end and also try to adapt the code of the Kandinsky module with your proposal.
It is enforced when you use a cpu-only inference in biniou. I don't get it either ...
I've tried outside biniou as of now. In the app, it can happen that it's not needed.
Maybe it require a CUDA compatible PyTorch and not the cpu-only version even if it only use the CPU ?
No, I tried with CPU-only PyTorch, and it worked well. My script is as follows:
from diffusers import AutoPipelineForText2Image, Kandinsky3Pipeline
import torch
from uuid import uuid4
pipe: Kandinsky3Pipeline = AutoPipelineForText2Image.from_pretrained(
"kandinsky-community/kandinsky-3",
variant="fp16",
torch_dtype=torch.float32,
resume_download=True,
cache_dir="./kand3-model",
local_files_only=True
)
print(pipe._execution_device)
#pipe.enable_model_cpu_offload()
prompt = "<a prompt here>"
generator = torch.Generator(device="cpu").manual_seed(0)
image = pipe(prompt, num_inference_steps=25, generator=generator).images[0]
image.save(f"image_{uuid4()}.png")
Finally, got it to work inside biniou. I had to do the following modifications in ressources/txt2img_kd.py
(the line numbers are given as per 847ab36):
torch_dtype=model_arch
on lines 86 and 93, to force use float32 on CPUCallback exception:
Traceback (most recent call last):
File "/home/trolley813/development/experimental/biniou/env/lib/python3.11/site-packages/gradio/queueing.py", line 407, in call_prediction
output = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/trolley813/development/experimental/biniou/env/lib/python3.11/site-packages/gradio/route_utils.py", line 226, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/trolley813/development/experimental/biniou/env/lib/python3.11/site-packages/gradio/blocks.py", line 1550, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/trolley813/development/experimental/biniou/env/lib/python3.11/site-packages/gradio/blocks.py", line 1185, in call_function
prediction = await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/trolley813/development/experimental/biniou/env/lib/python3.11/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/trolley813/development/experimental/biniou/env/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/home/trolley813/development/experimental/biniou/env/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/trolley813/development/experimental/biniou/env/lib/python3.11/site-packages/gradio/utils.py", line 661, in wrapper
response = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/trolley813/development/experimental/biniou/env/lib/python3.11/site-packages/gradio/utils.py", line 661, in wrapper
response = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/trolley813/development/experimental/biniou/ressources/common.py", line 327, in wrap_func
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/trolley813/development/experimental/biniou/ressources/txt2img_kd.py", line 132, in image_txt2img_kd
image = pipe_txt2img_kd(
^^^^^^^^^^^^^^^^
File "/home/trolley813/development/experimental/biniou/env/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/trolley813/development/experimental/biniou/env/lib/python3.11/site-packages/diffusers/pipelines/kandinsky3/pipeline_kandinsky3.py", line 561, in __call__
if callback is not None and i % callback_steps == 0:
~~^~~~~~~~~~~~~~~~
TypeError: unsupported operand type(s) for %: 'int' and 'NoneType'
Thanks a lot !
It was a pretty bad move from my part to use float16 as torch dtype with CPU-only ...
For the callback, it may uses the new callback_steps
instead of legacy callback
(the error message looks very similar).
Will do a commit in the evening to correct that and insert your changes !
Thanks again :)
Edit : commit 6434292 introduce support for Kandinsky 3.0. Cancel button should be functional, but has not been tested.
Edit : commit 6434292 introduce support for Kandinsky 3.0. Cancel button should be functional, but has not been tested.
Thanks! Finally tested, it works good and (with CUDA) very fast. With 8GB of VRAM, at least 1024x1024 is possible (with both 2.2 and 3.0). Probably, the sequential CPU offload does the job.
P.S. By "very fast" I mean about 25-30 seconds per 1024x1024 image on 3.0.
P.S. By "very fast" I mean about 25-30 seconds per 1024x1024 image on 3.0.
Compared to CPU inference, it is indeed very fast !
Thanks again for your contribution, i will try to port your code to the Kandinsky module to accelerate further the module.
Thanks again for your contribution, i will try to port your code to the Kandinsky module to accelerate further the module.
If you're talking about ai-forever/kandinsky-2#79, then that's not needed, as I see. The offloading procedure takes care of all of this, so the large image generation becomes possible.
By the way, today is the 157th birth anniversary of the Russian painter Wassily Kandinsky. The eponymous AI was named after him. P.S. His 79th death anniversary also was recently, 3 days ago.
If you're talking about https://github.com/ai-forever/Kandinsky-2/issues/79, then that's not needed, as I see. The offloading procedure takes care of all of this, so the large image generation becomes possible.
Oh great ! I thought it could improve generation speed, but reading closely your post, it's indeed faster with CPU offloading.
By the way, today is the 157th birth anniversary of the Russian painter Wassily Kandinsky. The eponymous AI was named after him. P.S. His 79th death anniversary also was recently, 3 days ago.
I didn't know that (for his anniversary). Thanks for the information. Kandinsky is a great model and a nice alternative to Stable Diffusion. I'm really happy to support it via his integration into biniou.
Describe the bug Translation into a language with non-Latin (i.e. non-ASCII-encodable, e.g. Cyrillic) script fails due to an attempt to write into a file in
ascii
encoding.To Reproduce Steps to reproduce the behavior:
Expected behavior Translation completes normally
Console log Only the relevant part
Screenshots Probably not needed, since it's described in the text above.
Hardware (please complete the following information):
Desktop (please complete the following information):
Additional informations
Additional context The solution (which worked for me) is to open the relevant files in UTF-8 mode, but I'm unsure that there's no other places to fix. Here's in
ressources/common.py
: