ayttop commented 1 month ago

not run on colab t4

from OmniGen import OmniGenPipeline import torch import os os.environ["CUDA_VISIBLE_DEVICES"] = "0" import transformers transformers.logging.set_verbosity_error() device = "cuda" if torch.cuda.is_available() else "cpu" pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1", device_map=device)

Text to Image

images = pipe( prompt="A curly-haired man in a red shirt is drinking tea.", height=768, width=512, guidance_scale=1, seed=0, separate_cfg_infer=True, num_inference_steps=1, num_images_per_prompt=1, use_kv_cache=True ) images[0].save("example_t2i.png") # save output PIL Image

Text to Image

images = pipe( prompt="A curly-haired man in a red shirt is drinking tea.", height=768, width=512, guidance_scale=1, seed=0, separate_cfg_infer=True, num_inference_steps=1, num_images_per_prompt=1, use_kv_cache=True ) images[0].save("example_t2i.png") # save output PIL Image

TypeError Traceback (most recent call last) in <cell line: 8>() 6 transformers.logging.set_verbosity_error() 7 device = "cuda" if torch.cuda.is_available() else "cpu" ----> 8 pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1", device_map=device) 9 10 # Text to Image

TypeError: OmniGenPipeline.from_pretrained() got an unexpected keyword argument 'device_map'

ayttop commented 1 month ago

not run with accelerate,bitsandbytes from OmniGen import OmniGenPipeline from accelerate import init_empty_weights import bitsandbytes as bnb

Initialize the model with empty weights to save memory

with init_empty_weights(): pipe = OmniGenPipeline.from_pretrained( "Shitao/OmniGen-v1", device_map="auto", # Automatically maps model layers to available devices torch_dtype=bnb.float16, # Set data type for bitsandbytes load_in_4bit=True # Load model in 4-bit precision using bitsandbytes )

staoxiao commented 1 month ago

Current code doesn't support quantization. We will consider this in the future.

able2608 commented 1 month ago

Apparently someone did try to implement quantization, however it is still a WIP and might be somewhat fiddly to use. Check out this PR if you are interested in using it: https://github.com/VectorSpaceLab/OmniGen/pull/29. You might need to tweak some files as discussed in the PR's discussion after downloading it to get it to work, plus Colab RAM (yes RAM not VRAM) only caps at 12GB for free tier users, so the quantization process will be slow at least and will probably straight up OOM for now. It pretty much filled up the 16GB of RAM on my system running Windows 11 and requires extensive offloading to the disk when quantizing. However judging from the VRAM usage on my system, once the quantization process is over the model might be able to fit in T4's VRAM. Perhaps you would want to wait for the code to be more optimized.

nitinmukesh commented 1 month ago

@able2608 @staoxiao

It is working on low VRAM

vlcsnap-2024-10-29-00h58m15s559

try this https://www.youtube.com/watch?v=9ZXmXA2AJZ4

ayttop commented 1 month ago

2024-10-28 22:00:46.180633: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-10-28 22:00:46.455676: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-10-28 22:00:46.544035: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-10-28 22:00:47.043118: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-10-28 22:00:49.018393: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

Running on local URL: http://127.0.0.1:7860/
Running on public URL: https://9fb36dc912050a91af.gradio.live/

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run gradio deploy from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces) Fetching 10 files: 100% 10/10 [00:00<00:00, 124460.06it/s]

but dont work with colab t4

ayttop commented 1 month ago

!git clone https://github.com/Manni1000/OmniGen.git

%cd OmniGen

!pip install -e .

!pip install gradio spaces

!apt install net-tools -y

!netstat -an | grep 7860

from google.colab import output

!python /content/OmniGen/app.py

ayttop commented 1 month ago

!pip install -r /content/OmniGen/requirements.txt

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. gcsfs 2024.6.1 requires fsspec==2024.6.1, but you have fsspec 2024.5.0 which is incompatible. torchaudio 2.1.1+cu121 requires torch==2.1.1, but you have torch 2.3.1+cu121 which is incompatible. Successfully installed fsspec-2024.5.0 torch-2.3.1+cu121 torchvision-0.18.1+cu121 triton-2.3.1

ayttop commented 1 month ago

2024-10-28 22:29:35.060112: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-10-28 22:29:35.093217: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-10-28 22:29:35.103173: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-10-28 22:29:35.126043: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-10-28 22:29:36.350897: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

Running on local URL: http://127.0.0.1:7860/
Running on public URL: https://bd987c9a537c35850b.gradio.live/

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run gradio deploy from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces) Fetching 10 files: 100% 10/10 [00:00<00:00, 50472.97it/s] Screenshot 2024-10-28 153022

ayttop commented 1 month ago

gpu not run on colab t4

ayttop commented 1 month ago

![Uploading Screenshot 2024-10-28 153256.png…]()

ayttop commented 1 month ago

Screenshot 2024-10-28 153256

ayttop commented 1 month ago

on colab tpu

!python /content/OmniGen/app.py /usr/local/lib/python3.10/dist-packages/gradio/utils.py:980: UserWarning: Expected 11 arguments for function <function generate_image at 0x7d4a6ec5e290>, received 10. warnings.warn( /usr/local/lib/python3.10/dist-packages/gradio/utils.py:984: UserWarning: Expected at least 11 arguments for function <function generate_image at 0x7d4a6ec5e290>, received 10. warnings.warn(

Running on local URL: http://127.0.0.1:7860/
Running on public URL: https://a19be6bd2476aa5424.gradio.live/

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run gradio deploy from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces) /usr/local/lib/python3.10/dist-packages/gradio/helpers.py:987: UserWarning: Unexpected argument. Filling with None. warnings.warn("Unexpected argument. Filling with None.") Fetching 10 files: 100% 10/10 [00:00<00:00, 93832.30it/s] Loading safetensors Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 624, in process_events response = await route_utils.call_process_api( File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 323, in call_process_api output = await app.get_blocks().process_api( File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 2018, in process_api result = await self.call_function( File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1567, in call_function prediction = await anyio.to_thread.run_sync( # type: ignore File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run result = context.run(func, args) File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 846, in wrapper response = f(args, *kwargs) File "/content/OmniGen/app.py", line 51, in generate_image output = pipe( File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(args, **kwargs) File "/content/OmniGen/OmniGen/pipeline.py", line 189, in call generator = torch.Generator(device=self.device).manual_seed(seed) RuntimeError: manual_seed expected a long, but got bool

ayttop commented 1 month ago

Screenshot 2024-10-28 160844

werruww commented 1 month ago

Collecting cloud-tpu-client==0.10 Downloading cloud_tpu_client-0.10-py3-none-any.whl.metadata (1.2 kB) Collecting torch==1.13.0 Downloading torch-1.13.0-cp310-cp310-manylinux1_x86_64.whl.metadata (23 kB) Collecting torchvision==0.14.0 Downloading torchvision-0.14.0-cp310-cp310-manylinux1_x86_64.whl.metadata (11 kB) Collecting torchtext==0.14.0 Downloading torchtext-0.14.0-cp310-cp310-manylinux1_x86_64.whl.metadata (6.9 kB) ERROR: Could not find a version that satisfies the requirement torch_xla==1.13 (from versions: 2.1.0rc5, 2.1.0, 2.2.0, 2.3.0, 2.4.0, 2.5.0) ERROR: No matching distribution found for torch_xla==1.13

yuezewang commented 1 month ago

not run on colab t4

from OmniGen import OmniGenPipeline import torch import os os.environ["CUDA_VISIBLE_DEVICES"] = "0" import transformers transformers.logging.set_verbosity_error() device = "cuda" if torch.cuda.is_available() else "cpu" pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1", device_map=device)

Text to Image

images = pipe( prompt="A curly-haired man in a red shirt is drinking tea.", height=768, width=512, guidance_scale=1, seed=0, separate_cfg_infer=True, num_inference_steps=1, num_images_per_prompt=1, use_kv_cache=True ) images[0].save("example_t2i.png") # save output PIL Image

Text to Image

images = pipe( prompt="A curly-haired man in a red shirt is drinking tea.", height=768, width=512, guidance_scale=1, seed=0, separate_cfg_infer=True, num_inference_steps=1, num_images_per_prompt=1, use_kv_cache=True ) images[0].save("example_t2i.png") # save output PIL Image

TypeError Traceback (most recent call last) in <cell line: 8>() 6 transformers.logging.set_verbosity_error() 7 device = "cuda" if torch.cuda.is_available() else "cpu" ----> 8 pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1", device_map=device) 9 10 # Text to Image

TypeError: OmniGenPipeline.from_pretrained() got an unexpected keyword argument 'device_map'

Hello, you should remove the device_map=device as:

# The pipeline will detect valid gpu device automatically
pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")  # so just remove ', device_map=device'

werruww commented 1 month ago

The problem is that I want to run it in Colab T4 and the RAM is 12, so I want to either quantize it and then use it after saving it in T4 or use it with acclrate device_map=device

werruww commented 1 month ago

yuezewang

it not run on gpu colab t4

Your session crashed after using all available RAM.

from OmniGen import OmniGenPipeline

pipe = OmniGenPipeline.from_pretrained("goodasdgood/OmniGen_quantization")

Text to Image

images = pipe( prompt="A curly-haired man in a red shirt is drinking tea.", height=1024, width=1024, guidance_scale=2.5, seed=0, ) images[0].save("example_t2i.png") # save output PIL Image

werruww commented 1 month ago

https://huggingface.co/goodasdgood/OmniGen_quantization/tree/main

werruww commented 1 month ago

yuezewang

where path to model quantization?

Ordoumpozanis commented 1 month ago

in order to run i and bypass the error od device i just exposed the device to pipeline.py

Define device globally (optional)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print(f'Device ={device}')

90 class OmniGenPipeline: def init( self, vae: AutoencoderKL, model: OmniGen, processor: OmniGenProcessor, ): self.vae = vae self.model = model self.processor = processor self.model.to(torch.bfloat16) self.model.eval() self.vae.eval()

    self.model_cpu_offload = False

then replace any sel.device with device as now is global and it will work

Qarqor5555555 commented 1 month ago

Question: Does adding compression to the loading function not store the model on the hard disk? Is this method different from the method for converting a model to 4 bit? Like unshulesh

Qarqor5555555 commented 1 month ago

Question: Does adding pressure in the loading function differ from the method of converting the model to 4bit like unsulsh

ayttop commented 1 month ago

NameError: name 'is_torch_npu_available' is not defined. Did you mean: 'is_torch_xla_available'?

ayttop commented 1 month ago

from OmniGen import OmniGenPipeline

import torch pipe = OmniGenPipeline.from_pretrained("C:/Users/m/Desktop/4/OmniGen-v1")

Text to Image

images = pipe( prompt="car.", height=64, width=64, num_inference_steps=2, guidance_scale=2, seed=0, ) images[0].save("example_t2i.png") # save output PIL Image

NameError: name 'is_torch_npu_available' is not defined. Did you mean: 'is_torch_xla_available'?

ronfromhp commented 1 month ago

@able2608 @staoxiao

It is working on low VRAM

try this https://www.youtube.com/watch?v=9ZXmXA2AJZ4

wait, how are you getting it 30 times faster than mine? this is for the exact same prompt

staoxiao commented 1 month ago

@ronfromhp , do you have a GPU? Running on CPU is very slow. You can try the latest code, and refer to https://github.com/VectorSpaceLab/OmniGen/blob/main/docs/inference.md#requiremented-resources for inference time.

ronfromhp commented 1 month ago

@staoxiao , I have a RTX 4050 laptop GPU 6gb. So it must be running slow because of that. But i tried the forked repo of the guy i was replying to https://github.com/VectorSpaceLab/OmniGen/issues/44#issuecomment-2442448445 and it seems he's got a quantised model working that's like 50-100 times faster on my gpu

nitinmukesh commented 1 month ago

@ronfromhp

Can you confirm that my fork is working fine for you and the generation is fast? Other viewers of my channel confirmed that it is working good.

ronfromhp commented 1 month ago

@nitinmukesh , upto a certain point, it is fast. But it fails at a certain point if i give two input images prompts and ask for a 1080p output for example. Then it falls back to 280 sec/step. I'd describe it as a sigmoid curve, if you exceed a certain threshold it becomes 50 ish times slower

NormalMultiaccount commented 1 month ago

Has anyone had omnigen run on collab?

VectorSpaceLab / OmniGen

not run #44

Text to Image

Text to Image

Initialize the model with empty weights to save memory

Text to Image

Text to Image

Text to Image

Define device globally (optional)

Text to Image