RAYON_NUM_THREADS=6 CUDA_VISIBLE_DEVICES=0 python3 -m rest.inference.cli --datastore-path datastore/datastore_chat_small.idx --base-model meta-llama/Meta-Llama-3-8B-Instruct
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:05<00:00, 1.47s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
USER: hey
ASSISTANT: Traceback (most recent call last):
...
File "/home/liranringel/REST/rest/model/modeling_llama_kv.py", line 594, in forward
key_states = past_key_value[0].cat(key_states, dim=2)
File "/home/liranringel/REST/rest/model/kvcache.py", line 66, in cat
dst.copy(tensor)
RuntimeError: The size of tensor a (32) must match the size of tensor b (8) at non-singleton dimension 1
Have you encountered the problem of segmentation fault (core dumped) when using Llama-3-8B and running python3 get_datastore_chat.py --model-path Meta-Llama-3-8B-Instruct?
When I run:
I get: