Closed dzyjjpy closed 7 months ago
--prompt="What is the video about?" ---> --prompt="What is the image about?"
--mesh_dim='!1,1,8,1' removed
@jackyin68 thanks tried video and image both before, not work. removed --mesh_dim, same issue~
This looks like a python version incompatability - what version are you using? It should be 3.10 (see conda env instructions)
This looks like a python version incompatability - what version are you using? It should be 3.10 (see conda env instructions)
python version: 3.9.13 I will try it with python 3.10. Thanks
@wilson1yan It didn't work. More samples are needed including language and vision version.
./scripts/run_vision_chat.sh Traceback (most recent call last): File "/home/jiapeiyang/anaconda3/envs/nlp/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/jiapeiyang/anaconda3/envs/nlp/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/jiapeiyang/workspace/LWM/lwm/vision_chat.py", line 18, in
from lwm.vision_llama import VideoLLaMAConfig, FlaxVideoLLaMAForCausalLM
File "/home/jiapeiyang/workspace/LWM/lwm/vision_llama.py", line 21, in
from lwm.llama import LLaMAConfig, LLAMA_STANDARD_CONFIGS, FlaxLLaMABlockCollection, RMSNorm
File "/home/jiapeiyang/workspace/LWM/lwm/llama.py", line 31, in
from lwm.ring_attention import blockwise_ffn, ring_flash_attention_tpu, \
File "/home/jiapeiyang/workspace/LWM/lwm/ring_attention.py", line 557, in
class BlockSizes:
File "/home/jiapeiyang/workspace/LWM/lwm/ring_attention.py", line 563, in BlockSizes
block_q_major_dkv: int | None = None
TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'
` export llama_tokenizer_path="/home/jiapeiyang/workspace/LWM/models/LWM-Chat-32K-Jax/tokenizer.model" export vqgan_checkpoint="/home/jiapeiyang/workspace/LWM/models/LWM-Chat-32K-Jax/vqgan" export lwm_checkpoint="/home/jiapeiyang/workspace/LWM/models/LWM-Chat-32K-Jax/params" export input_file="/home/jiapeiyang/workspace/LWM/models/LWM-Chat-32K-Jax/test_a.jpg"
python3 -u -m lwm.vision_chat \ --prompt="What is the video about?" \ --input_file="$input_file" \ --vqgan_checkpoint="$vqgan_checkpoint" \ --mesh_dim='!1,1,8,1' \ --dtype='fp32' \ --load_llama_config='7b' \ --max_n_frames=8 \ --update_llama_config="dict(sample_mode='text',theta=50000000,max_sequence_length=131072,use_flash_attention=False,scan_attention=False,scan_query_chunk_size=128,scan_key_chunk_size=128,remat_attention='',scan_mlp=False,scan_mlp_chunk_size=2048,remat_mlp='',remat_block='',scan_layers=True)" \ --load_checkpoint="params::$lwm_checkpoint" \ --tokenizer.vocab_file="$llama_tokenizer_path" \ 2>&1 | tee ~/output.log read
`