LLama3 8B is not supported

When I run:

RAYON_NUM_THREADS=6 CUDA_VISIBLE_DEVICES=0 python3 -m rest.inference.cli --datastore-path datastore/datastore_chat_small.idx --base-model meta-llama/Meta-Llama-3-8B-Instruct

I get:

RAYON_NUM_THREADS=6 CUDA_VISIBLE_DEVICES=0 python3 -m rest.inference.cli --datastore-path datastore/datastore_chat_small.idx --base-model meta-llama/Meta-Llama-3-8B-Instruct Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:05<00:00, 1.47s/it] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. USER: hey ASSISTANT: Traceback (most recent call last): ... File "/home/liranringel/REST/rest/model/modeling_llama_kv.py", line 594, in forward key_states = past_key_value[0].cat(key_states, dim=2) File "/home/liranringel/REST/rest/model/kvcache.py", line 66, in cat dst.copy(tensor) RuntimeError: The size of tensor a (32) must match the size of tensor b (8) at non-singleton dimension 1

FasterDecoding / REST

LLama3 8B is not supported #17