Open ayttop opened 1 month ago
not run with accelerate,bitsandbytes from OmniGen import OmniGenPipeline from accelerate import init_empty_weights import bitsandbytes as bnb
with init_empty_weights(): pipe = OmniGenPipeline.from_pretrained( "Shitao/OmniGen-v1", device_map="auto", # Automatically maps model layers to available devices torch_dtype=bnb.float16, # Set data type for bitsandbytes load_in_4bit=True # Load model in 4-bit precision using bitsandbytes )
Current code doesn't support quantization. We will consider this in the future.
Apparently someone did try to implement quantization, however it is still a WIP and might be somewhat fiddly to use. Check out this PR if you are interested in using it: https://github.com/VectorSpaceLab/OmniGen/pull/29. You might need to tweak some files as discussed in the PR's discussion after downloading it to get it to work, plus Colab RAM (yes RAM not VRAM) only caps at 12GB for free tier users, so the quantization process will be slow at least and will probably straight up OOM for now. It pretty much filled up the 16GB of RAM on my system running Windows 11 and requires extensive offloading to the disk when quantizing. However judging from the VRAM usage on my system, once the quantization process is over the model might be able to fit in T4's VRAM. Perhaps you would want to wait for the code to be more optimized.
2024-10-28 22:00:46.180633: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-10-28 22:00:46.455676: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-10-28 22:00:46.544035: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-10-28 22:00:47.043118: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-10-28 22:00:49.018393: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run gradio deploy
from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)
Fetching 10 files: 100% 10/10 [00:00<00:00, 124460.06it/s]
but dont work with colab t4
!git clone https://github.com/Manni1000/OmniGen.git
%cd OmniGen
!pip install -e .
!pip install gradio spaces
!apt install net-tools -y
!netstat -an | grep 7860
from google.colab import output
!python /content/OmniGen/app.py
!pip install -r /content/OmniGen/requirements.txt
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. gcsfs 2024.6.1 requires fsspec==2024.6.1, but you have fsspec 2024.5.0 which is incompatible. torchaudio 2.1.1+cu121 requires torch==2.1.1, but you have torch 2.3.1+cu121 which is incompatible. Successfully installed fsspec-2024.5.0 torch-2.3.1+cu121 torchvision-0.18.1+cu121 triton-2.3.1
2024-10-28 22:29:35.060112: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-10-28 22:29:35.093217: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-10-28 22:29:35.103173: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-10-28 22:29:35.126043: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-10-28 22:29:36.350897: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run gradio deploy
from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)
Fetching 10 files: 100% 10/10 [00:00<00:00, 50472.97it/s]
gpu not run on colab t4
![Uploading Screenshot 2024-10-28 153256.png…]()
on colab tpu
!python /content/OmniGen/app.py /usr/local/lib/python3.10/dist-packages/gradio/utils.py:980: UserWarning: Expected 11 arguments for function <function generate_image at 0x7d4a6ec5e290>, received 10. warnings.warn( /usr/local/lib/python3.10/dist-packages/gradio/utils.py:984: UserWarning: Expected at least 11 arguments for function <function generate_image at 0x7d4a6ec5e290>, received 10. warnings.warn(
This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run gradio deploy
from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)
/usr/local/lib/python3.10/dist-packages/gradio/helpers.py:987: UserWarning: Unexpected argument. Filling with None.
warnings.warn("Unexpected argument. Filling with None.")
Fetching 10 files: 100% 10/10 [00:00<00:00, 93832.30it/s]
Loading safetensors
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 624, in process_events
response = await route_utils.call_process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 323, in call_process_api
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 2018, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1567, in call_function
prediction = await anyio.to_thread.run_sync( # type: ignore
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, args)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 846, in wrapper
response = f(args, *kwargs)
File "/content/OmniGen/app.py", line 51, in generate_image
output = pipe(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(args, **kwargs)
File "/content/OmniGen/OmniGen/pipeline.py", line 189, in call
generator = torch.Generator(device=self.device).manual_seed(seed)
RuntimeError: manual_seed expected a long, but got bool
Collecting cloud-tpu-client==0.10 Downloading cloud_tpu_client-0.10-py3-none-any.whl.metadata (1.2 kB) Collecting torch==1.13.0 Downloading torch-1.13.0-cp310-cp310-manylinux1_x86_64.whl.metadata (23 kB) Collecting torchvision==0.14.0 Downloading torchvision-0.14.0-cp310-cp310-manylinux1_x86_64.whl.metadata (11 kB) Collecting torchtext==0.14.0 Downloading torchtext-0.14.0-cp310-cp310-manylinux1_x86_64.whl.metadata (6.9 kB) ERROR: Could not find a version that satisfies the requirement torch_xla==1.13 (from versions: 2.1.0rc5, 2.1.0, 2.2.0, 2.3.0, 2.4.0, 2.5.0) ERROR: No matching distribution found for torch_xla==1.13
not run on colab t4
from OmniGen import OmniGenPipeline import torch import os os.environ["CUDA_VISIBLE_DEVICES"] = "0" import transformers transformers.logging.set_verbosity_error() device = "cuda" if torch.cuda.is_available() else "cpu" pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1", device_map=device)
Text to Image
images = pipe( prompt="A curly-haired man in a red shirt is drinking tea.", height=768, width=512, guidance_scale=1, seed=0, separate_cfg_infer=True, num_inference_steps=1, num_images_per_prompt=1, use_kv_cache=True ) images[0].save("example_t2i.png") # save output PIL Image
Text to Image
images = pipe( prompt="A curly-haired man in a red shirt is drinking tea.", height=768, width=512, guidance_scale=1, seed=0, separate_cfg_infer=True, num_inference_steps=1, num_images_per_prompt=1, use_kv_cache=True ) images[0].save("example_t2i.png") # save output PIL Image
TypeError Traceback (most recent call last) in <cell line: 8>() 6 transformers.logging.set_verbosity_error() 7 device = "cuda" if torch.cuda.is_available() else "cpu" ----> 8 pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1", device_map=device) 9 10 # Text to Image
TypeError: OmniGenPipeline.from_pretrained() got an unexpected keyword argument 'device_map'
Hello, you should remove the device_map=device
as:
# The pipeline will detect valid gpu device automatically
pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1") # so just remove ', device_map=device'
The problem is that I want to run it in Colab T4 and the RAM is 12, so I want to either quantize it and then use it after saving it in T4 or use it with acclrate device_map=device
yuezewang
it not run on gpu colab t4
Your session crashed after using all available RAM.
from OmniGen import OmniGenPipeline
pipe = OmniGenPipeline.from_pretrained("goodasdgood/OmniGen_quantization")
images = pipe( prompt="A curly-haired man in a red shirt is drinking tea.", height=1024, width=1024, guidance_scale=2.5, seed=0, ) images[0].save("example_t2i.png") # save output PIL Image
yuezewang
where path to model quantization?
in order to run i and bypass the error od device i just exposed the device to pipeline.py
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print(f'Device ={device}')
90 class OmniGenPipeline: def init( self, vae: AutoencoderKL, model: OmniGen, processor: OmniGenProcessor, ): self.vae = vae self.model = model self.processor = processor self.model.to(torch.bfloat16) self.model.eval() self.vae.eval()
self.model_cpu_offload = False
then replace any sel.device with device as now is global and it will work
Question: Does adding compression to the loading function not store the model on the hard disk? Is this method different from the method for converting a model to 4 bit? Like unshulesh
Question: Does adding pressure in the loading function differ from the method of converting the model to 4bit like unsulsh
NameError: name 'is_torch_npu_available' is not defined. Did you mean: 'is_torch_xla_available'?
from OmniGen import OmniGenPipeline
import torch pipe = OmniGenPipeline.from_pretrained("C:/Users/m/Desktop/4/OmniGen-v1")
images = pipe( prompt="car.", height=64, width=64, num_inference_steps=2, guidance_scale=2, seed=0, ) images[0].save("example_t2i.png") # save output PIL Image
NameError: name 'is_torch_npu_available' is not defined. Did you mean: 'is_torch_xla_available'?
@able2608 @staoxiao
It is working on low VRAM
wait, how are you getting it 30 times faster than mine? this is for the exact same prompt
@ronfromhp , do you have a GPU? Running on CPU is very slow. You can try the latest code, and refer to https://github.com/VectorSpaceLab/OmniGen/blob/main/docs/inference.md#requiremented-resources for inference time.
@staoxiao , I have a RTX 4050 laptop GPU 6gb. So it must be running slow because of that. But i tried the forked repo of the guy i was replying to https://github.com/VectorSpaceLab/OmniGen/issues/44#issuecomment-2442448445 and it seems he's got a quantised model working that's like 50-100 times faster on my gpu
@ronfromhp
Can you confirm that my fork is working fine for you and the generation is fast? Other viewers of my channel confirmed that it is working good.
@nitinmukesh , upto a certain point, it is fast. But it fails at a certain point if i give two input images prompts and ask for a 1080p output for example. Then it falls back to 280 sec/step. I'd describe it as a sigmoid curve, if you exceed a certain threshold it becomes 50 ish times slower
Has anyone had omnigen run on collab?
not run on colab t4
from OmniGen import OmniGenPipeline import torch import os os.environ["CUDA_VISIBLE_DEVICES"] = "0" import transformers transformers.logging.set_verbosity_error() device = "cuda" if torch.cuda.is_available() else "cpu" pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1", device_map=device)
Text to Image
images = pipe( prompt="A curly-haired man in a red shirt is drinking tea.", height=768, width=512, guidance_scale=1, seed=0, separate_cfg_infer=True, num_inference_steps=1, num_images_per_prompt=1, use_kv_cache=True ) images[0].save("example_t2i.png") # save output PIL Image
Text to Image
images = pipe( prompt="A curly-haired man in a red shirt is drinking tea.", height=768, width=512, guidance_scale=1, seed=0, separate_cfg_infer=True, num_inference_steps=1, num_images_per_prompt=1, use_kv_cache=True ) images[0].save("example_t2i.png") # save output PIL Image
TypeError Traceback (most recent call last) in <cell line: 8>()
6 transformers.logging.set_verbosity_error()
7 device = "cuda" if torch.cuda.is_available() else "cpu"
----> 8 pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1", device_map=device)
9
10 # Text to Image
TypeError: OmniGenPipeline.from_pretrained() got an unexpected keyword argument 'device_map'