Woolverine94 / biniou

a self-hosted webui for 30+ generative ai
GNU General Public License v3.0
464 stars 52 forks source link

Use UTF-8 for saving files with text (since it can be non-ASCII) #6

Closed trolley813 closed 10 months ago

trolley813 commented 10 months ago

Describe the bug Translation into a language with non-Latin (i.e. non-ASCII-encodable, e.g. Cyrillic) script fails due to an attempt to write into a file in ascii encoding.

To Reproduce Steps to reproduce the behavior:

  1. Go to "Text/nllb translation"
  2. Choose non-Latin-based language (e.g.Russian) as target (output) language
  3. Request translation of any text (e.g. "Hello")
  4. See the error

Expected behavior Translation completes normally

Console log Only the relevant part

Traceback (most recent call last):
  File "/home/trolley813/development/experimental/biniou/env/lib/python3.11/site-packages/gradio/queueing.py", line 407, in call_prediction
    output = await route_utils.call_process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/trolley813/development/experimental/biniou/env/lib/python3.11/site-packages/gradio/route_utils.py", line 226, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/trolley813/development/experimental/biniou/env/lib/python3.11/site-packages/gradio/blocks.py", line 1550, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/trolley813/development/experimental/biniou/env/lib/python3.11/site-packages/gradio/blocks.py", line 1185, in call_function
    prediction = await anyio.to_thread.run_sync(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/trolley813/development/experimental/biniou/env/lib/python3.11/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/trolley813/development/experimental/biniou/env/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/home/trolley813/development/experimental/biniou/env/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/trolley813/development/experimental/biniou/env/lib/python3.11/site-packages/gradio/utils.py", line 661, in wrapper
    response = f(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^
  File "/home/trolley813/development/experimental/biniou/env/lib/python3.11/site-packages/gradio/utils.py", line 661, in wrapper
    response = f(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^
  File "/home/trolley813/development/experimental/biniou/ressources/common.py", line 327, in wrap_func
    result = func(*args, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/home/trolley813/development/experimental/biniou/ressources/nllb.py", line 296, in text_nllb
    filename_nllb = write_file(output_nllb)
                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/trolley813/development/experimental/biniou/ressources/common.py", line 248, in write_file
    savefile.write(content)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128)

Screenshots Probably not needed, since it's described in the text above.

Hardware (please complete the following information):

Desktop (please complete the following information):

Additional informations

Additional context The solution (which worked for me) is to open the relevant files in UTF-8 mode, but I'm unsure that there's no other places to fix. Here's in ressources/common.py:

def write_file(*args) :
    timestamp = time.time()
    savename = f"outputs/{timestamp}.txt"
    content = ""
    for idx, data in enumerate(args):
        content += f"{data} \n"
    with open(savename, 'w', "utf-8") as savefile:
        savefile.write(content)
    return savename

def write_seeded_file(seed, *args) :
    timestamp = time.time()
    savename = f"outputs/{seed}_{timestamp}.txt"
    content = ""
    for idx, data in enumerate(args):
        content += f"{data} \n"
    with open(savename, 'w', "utf-8") as savefile:
        savefile.write(content)
    return savename
Woolverine94 commented 10 months ago

Hello @trolley813 ,

Thanks a lot for your feedback.

I can confirm and reproduce the issue.

The solution (which worked for me) is to open the relevant files in UTF-8 mode, but I'm unsure that there's no other places

You are right. This error has already been patched for the biniou console few weeks ago, but I completely missed the other impacted functions. As you suspect, reading and writing settings could potentially reproduce this behavior and have been modified too.

65dd31b should solve the issue you encountered.

I close the issue, but don't hesitate to re-open if needed.

Thanks again for your support !

trolley813 commented 10 months ago

Thanks! Yes, 65dd31b indeed fixed the problem, it works great!

P.S. Running on HDD also works great. Although initial loading of the model can take a while (tens of seconds), but further requests on the same model work much faster.

Woolverine94 commented 10 months ago

Thanks for the feedback.

I suspect that the combination of HDD + low RAM -which is not your case- will be a mess ("there will be swap").

As I see you have a powerful Nvidia GPU, could you also precise if you use the CUDA optimization of biniou or only the CPU base version ?

And if using CUDA, is it really accelerating inferences on heavy modules like images or videos ones ?

trolley813 commented 10 months ago

As I see you have a powerful Nvidia GPU, could you also precise if you use the CUDA optimization of biniou or only the CPU base version ?

Yes, I've tried both versions. However, the Bark module (text-to-speech) does not work with CUDA enabled (it complains something about that both CPU and CUDA are available, and one should specify which device to use - sorry, I don't remember the exact error message).

And if using CUDA, is it really accelerating inferences on heavy modules like images or videos ones ?

Yes, it does. At least for images. P.S. For Kandinsky 2, you can split the generation process between CPU and GPU, like I did here. Sorry, I did not read your code much, so I'm unsure if that's applicable to here (and if you're not already using this way). But it can help to achieve better results faster (i.e. on my 8GB GPU it's possible to draw 1024x1024 images and maybe somewhat larger (I've also tried rectangular 768x1536 ones) with reasonable speed (about 1 minute per image)).

Woolverine94 commented 10 months ago

Great news. Thanks a lot for your confirmation ! I wasn't sure at all that it will be usable with the modifications I made for CUDA compatibility.

Yes, I've tried both versions. However, the Bark module (text-to-speech) does not work with CUDA enabled (it complains something about that both CPU and CUDA are available, and one should specify which device to use - sorry, I don't remember the exact error message).

Don't mind. I will try to fix this issue asap. Edit : 2a1efd6 may fix the bark issue.

Yes, it does. At least for images. P.S. For Kandinsky 2, you can split the generation process between CPU and GPU, like I did https://github.com/ai-forever/Kandinsky-2/issues/79. Sorry, I did not read your code much, so I'm unsure if that's applicable to here (and if you're not already using this way). But it can help to achieve better results faster (i.e. on my 8GB GPU it's possible to draw 1024x1024 images and maybe somewhat larger (I've also tried rectangular 768x1536 ones) with reasonable speed (about 1 minute per image)).

Awesome and very smart ! I didn't suspect this can be done this way. Thus, i'm not sure it could be ported to the Kandinsky module, but i will give it a try.

FYI (as you are a Kandinsky user), and if you've missed it, Kandinsky 3.0 should be usable by uncommenting line 29 in ressources/txt2img_kd : # "kandinsky-community/kandinsky-3",

This will make Kandinsky 3.0 the default model for Kandinsky module, but it doesn't seems to be usable with cpu-only.

Thanks again for your comments and feedbacks, they are very important for the project.

trolley813 commented 10 months ago

FYI (as you are a Kandinsky user), and if you've missed it, Kandinsky 3.0 should be usable by uncommenting line 29 in ressources/txt2img_kd : # "kandinsky-community/kandinsky-3",

This will make Kandinsky 3.0 the default model for Kandinsky module, but it doesn't seems to be usable with cpu-only.

Thank you! Yes, I still did not succeed to run Kandinsky 3.0 on my PC, since it (likely) does not fit in 8GB VRAM, and attempting to run on CPU (not here, just with an example script) also showed an fp16-related error (CPU does not support float16's, and they are probably hardcoded somewhere, since the error persists when overriding fp16=False).

Thanks again for your comments and feedbacks, they are very important for the project.

I'm very glad to help!

trolley813 commented 10 months ago

~I still did not succeed to run Kandinsky 3.0 on my PC, since it (likely) does not fit in 8GB VRAM, and attempting to run on CPU (not here, just with an example script) also showed an fp16-related error (CPU does not support float16's, and they are probably hardcoded somewhere, since the error persists when overriding fp16=False).~

Well, I was all wrong. It works without CUDA by enforcing torch_dtype=torch.float32. However, it takes LOTS of RAM (~60GB, so some swapping can be expected with 64) and thus works somewhat slowly (~6 minutes per 1024x1024 image on 16-core/32-thread CPU). Maybe playing with device_map (I suspect that it acts similarly to ai-forever/Kandinsky-2#79) and/or using a different (smaller) text encoder will accelerate the generation process.

Woolverine94 commented 10 months ago

Well, I was all wrong. It works without CUDA by enforcing torch_dtype=torch.float32.

It is enforced when you use a cpu-only inference in biniou. I don't get it either ... Maybe it require a CUDA compatible PyTorch and not the cpu-only version even if it only use the CPU ? I will give it a try in the week-end and also try to adapt the code of the Kandinsky module with your proposal.

trolley813 commented 10 months ago

It is enforced when you use a cpu-only inference in biniou. I don't get it either ...

I've tried outside biniou as of now. In the app, it can happen that it's not needed.

Maybe it require a CUDA compatible PyTorch and not the cpu-only version even if it only use the CPU ?

No, I tried with CPU-only PyTorch, and it worked well. My script is as follows:

from diffusers import AutoPipelineForText2Image, Kandinsky3Pipeline
import torch
from uuid import uuid4

pipe: Kandinsky3Pipeline = AutoPipelineForText2Image.from_pretrained(
    "kandinsky-community/kandinsky-3", 
    variant="fp16", 
    torch_dtype=torch.float32,
    resume_download=True,
    cache_dir="./kand3-model",
    local_files_only=True
    )
print(pipe._execution_device)
#pipe.enable_model_cpu_offload()

prompt = "<a prompt here>"

generator = torch.Generator(device="cpu").manual_seed(0)
image = pipe(prompt, num_inference_steps=25, generator=generator).images[0]

image.save(f"image_{uuid4()}.png")
trolley813 commented 10 months ago

Finally, got it to work inside biniou. I had to do the following modifications in ressources/txt2img_kd.py (the line numbers are given as per 847ab36):

  1. Uncomment line 29 to allow Kandinsky 3 to work
  2. Set torch_dtype=model_arch on lines 86 and 93, to force use float32 on CPU
  3. Comment line 141 (unfortunately, the callback does not work with Kandinsky 3 due to the exception I provide below). This only disables the cancel button, as I see

Callback exception:

Traceback (most recent call last):
  File "/home/trolley813/development/experimental/biniou/env/lib/python3.11/site-packages/gradio/queueing.py", line 407, in call_prediction
    output = await route_utils.call_process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/trolley813/development/experimental/biniou/env/lib/python3.11/site-packages/gradio/route_utils.py", line 226, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/trolley813/development/experimental/biniou/env/lib/python3.11/site-packages/gradio/blocks.py", line 1550, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/trolley813/development/experimental/biniou/env/lib/python3.11/site-packages/gradio/blocks.py", line 1185, in call_function
    prediction = await anyio.to_thread.run_sync(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/trolley813/development/experimental/biniou/env/lib/python3.11/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/trolley813/development/experimental/biniou/env/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/home/trolley813/development/experimental/biniou/env/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/trolley813/development/experimental/biniou/env/lib/python3.11/site-packages/gradio/utils.py", line 661, in wrapper
    response = f(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^
  File "/home/trolley813/development/experimental/biniou/env/lib/python3.11/site-packages/gradio/utils.py", line 661, in wrapper
    response = f(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^
  File "/home/trolley813/development/experimental/biniou/ressources/common.py", line 327, in wrap_func
    result = func(*args, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/home/trolley813/development/experimental/biniou/ressources/txt2img_kd.py", line 132, in image_txt2img_kd
    image = pipe_txt2img_kd(
            ^^^^^^^^^^^^^^^^
  File "/home/trolley813/development/experimental/biniou/env/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/trolley813/development/experimental/biniou/env/lib/python3.11/site-packages/diffusers/pipelines/kandinsky3/pipeline_kandinsky3.py", line 561, in __call__
    if callback is not None and i % callback_steps == 0:
                                ~~^~~~~~~~~~~~~~~~
TypeError: unsupported operand type(s) for %: 'int' and 'NoneType'
Woolverine94 commented 10 months ago

Thanks a lot ! It was a pretty bad move from my part to use float16 as torch dtype with CPU-only ... For the callback, it may uses the new callback_steps instead of legacy callback (the error message looks very similar). Will do a commit in the evening to correct that and insert your changes !

Thanks again :)

Edit : commit 6434292 introduce support for Kandinsky 3.0. Cancel button should be functional, but has not been tested.

trolley813 commented 10 months ago

Edit : commit 6434292 introduce support for Kandinsky 3.0. Cancel button should be functional, but has not been tested.

Thanks! Finally tested, it works good and (with CUDA) very fast. With 8GB of VRAM, at least 1024x1024 is possible (with both 2.2 and 3.0). Probably, the sequential CPU offload does the job.

P.S. By "very fast" I mean about 25-30 seconds per 1024x1024 image on 3.0.

Woolverine94 commented 10 months ago

P.S. By "very fast" I mean about 25-30 seconds per 1024x1024 image on 3.0.

Compared to CPU inference, it is indeed very fast !

Thanks again for your contribution, i will try to port your code to the Kandinsky module to accelerate further the module.

trolley813 commented 10 months ago

Thanks again for your contribution, i will try to port your code to the Kandinsky module to accelerate further the module.

If you're talking about ai-forever/kandinsky-2#79, then that's not needed, as I see. The offloading procedure takes care of all of this, so the large image generation becomes possible.

trolley813 commented 10 months ago

By the way, today is the 157th birth anniversary of the Russian painter Wassily Kandinsky. The eponymous AI was named after him. P.S. His 79th death anniversary also was recently, 3 days ago.

Woolverine94 commented 10 months ago

If you're talking about https://github.com/ai-forever/Kandinsky-2/issues/79, then that's not needed, as I see. The offloading procedure takes care of all of this, so the large image generation becomes possible.

Oh great ! I thought it could improve generation speed, but reading closely your post, it's indeed faster with CPU offloading.

By the way, today is the 157th birth anniversary of the Russian painter Wassily Kandinsky. The eponymous AI was named after him. P.S. His 79th death anniversary also was recently, 3 days ago.

I didn't know that (for his anniversary). Thanks for the information. Kandinsky is a great model and a nice alternative to Stable Diffusion. I'm really happy to support it via his integration into biniou.