b4rtaz / distributed-llama

Tensor parallelism is all you need. Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage.
MIT License
1.02k stars 68 forks source link

Exception: max_seq_len is required, please update params.json with convert-llama.py on Meta-Llama-3-8B-Instruct #92

Closed unclemusclez closed 6 days ago

unclemusclez commented 2 weeks ago
musclez@NSA:~/llama3/Meta-Llama-3-8B-Instruct$ python ~/distributed-llama/converter/convert-llama.py . q40
Model name: .
Target float type: q40
Target file: dllama_model_._q40.m
Traceback (most recent call last):
  File "/home/musclez/distributed-llama/converter/convert-llama.py", line 119, in <module>
    convert(modelPath, outputFileName, targetFloatType)
  File "/home/musclez/distributed-llama/converter/convert-llama.py", line 19, in convert
    raise Exception('max_seq_len is required, please update params.json file')
Exception: max_seq_len is required, please update params.json file
b4rtaz commented 6 days ago

https://github.com/b4rtaz/distributed-llama/blob/main/docs/LLAMA.md#how-to-run-llama-3

  1. Open params.json and add a new property: "max_seq_len": 8192.