b4rtaz / distributed-llama

Tensor parallelism is all you need. Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage.
MIT License
1.03k stars 69 forks source link

JSONDecodeError("Expecting value", s, err.value) from None #45

Open unclemusclez opened 1 month ago

unclemusclez commented 1 month ago
ubuntu@ubuntu:~/llama3/Meta-Llama-3-8B-Instruct$ python3 ../../distributed-llama/converter/convert-llama.py ./ q40
Model name:
Target float type: q40
Target file: dllama__q40.bin
Traceback (most recent call last):
  File "/home/ubuntu/llama3/Meta-Llama-3-8B-Instruct/../../distributed-llama/converter/convert-llama.py", line 119, in <module>
    convert(modelPath, outputFileName, targetFloatType)
  File "/home/ubuntu/llama3/Meta-Llama-3-8B-Instruct/../../distributed-llama/converter/convert-llama.py", line 15, in convert
    params = json.load(f)
  File "/usr/lib/python3.10/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/usr/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 7 column 1 (char 6)

I get this when trying to convert llama3-instruct downloaded from the Meta Repo to q40.

DifferentialityDevelopment commented 1 month ago

You need to point it at the original folder of Meta-Llama-3-8B-Instruct not the one with safetensors but with the pth files, I already converted mine earlier today. I'll upload it to huggingface for you.

DifferentialityDevelopment commented 1 month ago

It's busy uploading, you can find it here: https://huggingface.co/Azamorn/Meta-Llama-3-8B-Instruct-Distributed

unclemusclez commented 1 month ago

i used download.sh and the key they provide. that was the error i got. I would like to be able to convert for the future reference. Tried this on 3 different machines.

Thank you for the converted model. I will post my results with 8x RPi 3b

unclemusclez commented 1 month ago
Model name: Meta-Llama-3-8B-Instruct
Target float type: q40
Target file: dllama_meta-llama-3-8b-instruct_q40.bin

I assumed this script would work for all of the LLama3 models, but it does not. Any chance we could get some instruct and or codellama conversion scripts?

b4rtaz commented 1 month ago

Could you try to run convert-llama.py from the directory where you have that script?

distributed-llama/converter % python convert-llama.py <modelPath> <targetFloatType>

Btw: in your logs I see JSONDecodeError. Are you pointing a correct directory?

unclemusclez commented 1 month ago

Yes.

The script DOES NOT work with Instruct Models or CodeLLama Models.

However, It DOES WORK for 8B, non-instruct, non-code models.

DifferentialityDevelopment commented 1 month ago

Could you give more information, what repo, what folder you pointing it at etc?

I should probably say it again, it doesn't work with safetensor files, but with .pth files. Ideally the script would work with safetensor models as well but not right now.

unclemusclez commented 1 month ago

i am downloading models directly from the meta repository. In order to download these models, you need a key mailed to you. I run the download.sh script as provided. when executing the converter script on models that are designed for Instruct/Code, the script fails.

https://i.imgur.com/XjvNKLl.png

When executing the Tokenizer script for instruct models:

https://i.imgur.com/Q4akLbO.png

DifferentialityDevelopment commented 1 month ago

I downloaded them straight from huggingface https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct then I used the files in the original folder in the model repo.

unclemusclez commented 1 month ago

i'll try this maybe i downloaded something wrong