Open LKAMING97 opened 3 weeks ago
python interleaved_generation.py -i 'Please introduce the city of Gyumri with pictures.' -s "./test/"
It was running for ages so I stopped
Instruction: draw a dog
Batch size: 2
VQModel loaded from data/tokenizer/vqgan.ckpt
^CTraceback (most recent call last):
File "/root/autodl-tmp/anole/text2image.py", line 71, in <module>
main(args)
File "/root/autodl-tmp/anole/text2image.py", line 46, in main
image_tokens: torch.LongTensor = model.generate(
File "/root/autodl-tmp/anole/chameleon/inference/chameleon.py", line 665, in generate
tokens = [t.id for t in self.stream(*args, **kwargs)]
File "/root/autodl-tmp/anole/chameleon/inference/chameleon.py", line 665, in <listcomp>
tokens = [t.id for t in self.stream(*args, **kwargs)]
File "/root/autodl-tmp/anole/chameleon/inference/chameleon.py", line 649, in stream
while key_token := self.dctx.res_q.get():
File "/root/miniconda3/lib/python3.10/multiprocessing/queues.py", line 103, in get
res = self._recv_bytes()
File "/root/miniconda3/lib/python3.10/multiprocessing/connection.py", line 221, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/root/miniconda3/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
buf = self._recv(4)
File "/root/miniconda3/lib/python3.10/multiprocessing/connection.py", line 384, in _recv
chunk = read(handle, remaining)
KeyboardInterrupt
^CException ignored in atexit callback: <function _exit_function at 0x7f7291b91b40>
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.10/multiprocessing/util.py", line 334, in _exit_function
_run_finalizers(0)
File "/root/miniconda3/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers
finalizer()
File "/root/miniconda3/lib/python3.10/multiprocessing/util.py", line 224, in __call__
res = self._callback(*self._args, **self._kwargs)
File "/root/miniconda3/lib/python3.10/multiprocessing/managers.py", line 674, in _finalize_manager
process.join(timeout=1.0)
File "/root/miniconda3/lib/python3.10/multiprocessing/process.py", line 149, in join
res = self._popen.wait(timeout)
File "/root/miniconda3/lib/python3.10/multiprocessing/popen_fork.py", line 40, in wait
if not wait([self.sentinel], timeout):
File "/root/miniconda3/lib/python3.10/multiprocessing/connection.py", line 936, in wait
ready = selector.select(timeout)
File "/root/miniconda3/lib/python3.10/selectors.py", line 416, in select
fd_event_list = self._selector.poll(timeout)
KeyboardInterrupt:
^C
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.78 Driver Version: 550.78 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3090 On | 00000000:98:00.0 Off | N/A |
| 0% 25C P8 22W / 370W | 564MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 3090 On | 00000000:B1:00.0 Off | N/A |
| 0% 26C P8 15W / 370W | 4MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
I can't generate anything in your example. What did I do wrong?
Thank you for your interest! Inference on Anole-7b requires at least 20GB of memory. It might be related to memory issue. Do you mind try using another GPU with larger memory? Thanks!
Can I not perform model inference with a single 3090 24GB ?
-LLLKAMING- @.***
------------------ Original ------------------ From: Ethan Chern @.> Date: Fri,Sep 13,2024 9:34 PM To: GAIR-NLP/anole @.> Cc: LKAMING @.>, Author @.> Subject: Re: [GAIR-NLP/anole] Inference Problem (Issue #42)
Thank you for your interest! Inference on Anole-7b requires at least 20GB of memory. It might be related to memory issue. Do you mind try using another GPU with larger memory? Thanks!
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
Hi, quantization might be helpful: https://github.com/GAIR-NLP/anole/pull/21/files If not, please let us know! Thanks.
I reinstalled the environment and ran it according to the steps, but this still happens.
Traceback (most recent call last):
File "interleaved_generation.py", line 5, in <module>
from chameleon.inference.chameleon import ChameleonInferenceModel, Options
File "/root/autodl-tmp/anole/chameleon/inference/chameleon.py", line 32, in <module>
from chameleon.inference import loader
File "/root/autodl-tmp/anole/chameleon/inference/loader.py", line 13, in <module>
from chameleon.inference.transformer import ModelArgs, Transformer
File "/root/autodl-tmp/anole/chameleon/inference/transformer.py", line 19, in <module>
class ModelArgs:
File "/root/autodl-tmp/anole/chameleon/inference/transformer.py", line 24, in ModelArgs
n_kv_heads: int | None = None
TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'
how to fix it
I have been installing according to your steps, but I keep having problems, which makes me very frustrated.
Use Python 3.10 or change them to the type of "rank: Union[int, None] = None", I recommend to use Python 3.10, I found a lot place need to change .
Hi, quantization might be helpful: https://github.com/GAIR-NLP/anole/pull/21/files If not, please let us know! Thanks.
after i use the quantization function,the program also says OutOfMemoryError as to:
Instruction: draw a dog
Batch size: 10
VQModel loaded from /data/mjl/model_zoo/Anole-7b-v0.1/tokenizer/vqgan.ckpt
Process SpawnProcess-2:
Traceback (most recent call last):
File "/data/mjl/anaconda3/envs/anole/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/data/mjl/anaconda3/envs/anole/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/data/mjl/anole-main/chameleon/inference/chameleon.py", line 495, in _worker_impl
model = loader.load_model(model, rank=rank)
File "/data/mjl/anole-main/chameleon/inference/loader.py", line 61, in load_model
return _convert(
File "/data/mjl/anole-main/chameleon/inference/loader.py", line 23, in _convert
torch.load(str(consolidated_path), map_location='cuda'),
File "/data/mjl/anaconda3/envs/anole/lib/python3.10/site-packages/torch/serialization.py", line 1014, in load
return _load(opened_zipfile,
File "/data/mjl/anaconda3/envs/anole/lib/python3.10/site-packages/torch/serialization.py", line 1422, in _load
result = unpickler.load()
File "/data/mjl/anaconda3/envs/anole/lib/python3.10/site-packages/torch/serialization.py", line 1392, in persistent_load
typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
File "/data/mjl/anaconda3/envs/anole/lib/python3.10/site-packages/torch/serialization.py", line 1366, in load_tensor
wrap_storage=restore_location(storage, location),
File "/data/mjl/anaconda3/envs/anole/lib/python3.10/site-packages/torch/serialization.py", line 1296, in restore_location
return default_restore_location(storage, map_location)
File "/data/mjl/anaconda3/envs/anole/lib/python3.10/site-packages/torch/serialization.py", line 381, in default_restore_location
result = fn(storage, location)
File "/data/mjl/anaconda3/envs/anole/lib/python3.10/site-packages/torch/serialization.py", line 279, in _cuda_deserialize
return obj.cuda(device)
File "/data/mjl/anaconda3/envs/anole/lib/python3.10/site-packages/torch/_utils.py", line 114, in _cuda
untyped_storage = torch.UntypedStorage(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB. GPU 0 has a total capacty of 23.69 GiB of which 18.75 MiB is free. Process 46513 has 558.00 MiB memory in use. Including non-PyTorch memory, this process has 23.12 GiB memory in use. Of the allocated memory 22.83 GiB is allocated by PyTorch, and 1.02 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
^CTraceback (most recent call last):
File "/data/mjl/anole-main/text2image.py", line 83, in <module>
main(args)
File "/data/mjl/anole-main/text2image.py", line 29, in main
unquantized_model = ChameleonInferenceModel(
File "/data/mjl/anole-main/chameleon/inference/chameleon.py", line 569, in __init__
self.dctx.ready_barrier.wait()
File "/data/mjl/anaconda3/envs/anole/lib/python3.10/threading.py", line 668, in wait
self._wait(timeout)
File "/data/mjl/anaconda3/envs/anole/lib/python3.10/threading.py", line 703, in _wait
if not self._cond.wait_for(lambda : self._state != 0, timeout):
File "/data/mjl/anaconda3/envs/anole/lib/python3.10/multiprocessing/synchronize.py", line 313, in wait_for
self.wait(waittime)
File "/data/mjl/anaconda3/envs/anole/lib/python3.10/multiprocessing/synchronize.py", line 261, in wait
return self._wait_semaphore.acquire(True, timeout)
KeyboardInterrupt
my device also RTX 3090 ,i dont know how to solve this problem.If you can help me solve it, I would be extremely grateful
also,before quantization the free memory is 18.75 MiB,after quantization the free memory is 18.75 MiB too,is the function is not work?
Why does it take so long to infer just two pictures?
![Uploading image.png…]()