b4rtaz / distributed-llama

Tensor parallelism is all you need. Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage.
MIT License
1.02k stars 68 forks source link

feat: add to tokenizer chat configuration. #76

Closed b4rtaz closed 1 month ago

b4rtaz commented 1 month ago

This PR extends the tokenizer file format. Now it's possible to add to the tokenzier file the chat configuration.

...
 seqLen: 8192
šŸ’” nSlices: 1
šŸ’” ropeTheta: 500000.0
šŸ“„ chatTemplate[0]: 
šŸ“„ chatTemplate[1]: <|start_header_id|>
šŸ“„ chatTemplate[2]: <|end_header_id|>

šŸ“„ chatTemplate[3]: <|eot_id|>
šŸ“„ chatTemplate[4]: <|start_header_id|>assistant<|end_header_id|>

šŸ“„ bosId: 128000
šŸ“„ eosId: 128001
šŸ“„ chatEosId: 128009
šŸ•’ ropeCache: 131072 kB
ā© Loaded 6175568 kB
DifferentialityDevelopment commented 1 month ago

Do you maybe know how I'd do the tokenizer conversion for models that don't have a tokenizer.model file?

b4rtaz commented 1 month ago

@DifferentialityDevelopment I think there is always a tokenizer somewhere but not always the format is obvious.

I'm trying to convert the tokenizer of the hermes model that you linked. I created a new converter that uses tokenizer_config.json and tokenizer.json files.

How to convert the tokenizer:

python3 convert-tokenizer-hf.py /Users/b4rtaz/Downloads/Hermes-2-Theta-Llama-3-8B hermes
ā­ Found chat template:

{{bos_token}}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}

ā­ To create the tokenizer file you need to manually specify chat template values. Enter \n for new line.
ā© Enter value for chat template key "chat_message_start":

ā© Enter value for chat template key "chat_role_start":
<|im_start|>
ā© Enter value for chat template key "chat_role_end":
\n
ā© Enter value for chat template key "chat_message_end":
<|im_end|>\n
ā© Enter value for chat template key "chat_generation_prompt":
<|im_start|>assistant\n
{'bos_id': 128000, 'eos_id': 128003, 'chat_eos_id': 128003, 'version': 0, 'vocab_size': 128256, 'max_token_length': 256, 'chat_template': 5}
{'chat_message_start': '', 'chat_role_start': '<|im_start|>', 'chat_role_end': '\n', 'chat_message_end': '<|im_end|>\n', 'chat_generation_prompt': '<|im_start|>assistant\n'}
āœ… Created dllama_tokenizer_hermes.t

So far I have:

b4rtaz@b4rtazs-MacBook-Pro examples % node chat-api-client.js
> system: You are an excellent math teacher.
> user: What is 1 + 2?
{ completion_tokens: 128, prompt_tokens: 54, total_tokens: 182 }
Ä D1Ä +Ä D2Ä isÄ theÄ sumÄ ofÄ twoÄ distances,Ä D1Ä andÄ D2.Ä ItÄ isÄ aÄ conceptÄ usedÄ inÄ geometryÄ andÄ trigonometryÄ toÄ relateÄ theÄ lengthsÄ ofÄ twoÄ sidesÄ ofÄ aÄ triangle.Ä TheÄ formulaÄ forÄ D1Ä +Ä D2Ä is:Ä D1Ä +Ä D2Ä =Ä sqrt((x2Ä -Ä x1)^2Ä +Ä (y2Ä -Ä y1)^2),Ä whereÄ (x1,Ä y1)Ä andÄ (x2,Ä y2)Ä areÄ theÄ coordinatesÄ ofÄ theÄ twoÄ points.Ä ThisÄ formulaÄ isÄ usedÄ toÄ findÄ theÄ distanceÄ betweenÄ twoÄ pointsÄ inÄ aÄ two-dimensionalÄ space.Ä DoÄ youÄ haveÄ anyÄ specificÄ questionsÄ aboutÄ thisÄ concept?Ä <|im_end

If I replace manualy Ä  => .

 D1 + D2 is the sum of two distances, D1 and D2. It is a concept used in geometry and trigonometry to relate the lengths of two sides of a triangle. The formula for D1 + D2 is: D1 + D2 = sqrt((x2 - x1)^2 + (y2 - y1)^2), where (x1, y1) and (x2, y2) are the coordinates of the two points. This formula is used to find the distance between two points in a two-dimensional space. Do you have any specific questions about this concept? <|im_end

The tokenizer is not easy part here. :)

DifferentialityDevelopment commented 1 month ago

@DifferentialityDevelopment I think there is always a tokenizer somewhere but not always the format is obvious.

I'm trying to convert the tokenizer of the hermes model that you linked. I created a new converter that uses tokenizer_config.json and tokenizer.json files.

How to convert the tokenizer:

python3 convert-tokenizer-hf.py /Users/b4rtaz/Downloads/Hermes-2-Theta-Llama-3-8B hermes
ā­ Found chat template:

{{bos_token}}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}

ā­ To create the tokenizer file you need to manually specify chat template values. Enter \n for new line.
ā© Enter value for chat template key "chat_message_start":

ā© Enter value for chat template key "chat_role_start":
<|im_start|>
ā© Enter value for chat template key "chat_role_end":
\n
ā© Enter value for chat template key "chat_message_end":
<|im_end|>\n
ā© Enter value for chat template key "chat_generation_prompt":
<|im_start|>assistant\n
{'bos_id': 128000, 'eos_id': 128003, 'chat_eos_id': 128003, 'version': 0, 'vocab_size': 128256, 'max_token_length': 256, 'chat_template': 5}
{'chat_message_start': '', 'chat_role_start': '<|im_start|>', 'chat_role_end': '\n', 'chat_message_end': '<|im_end|>\n', 'chat_generation_prompt': '<|im_start|>assistant\n'}
āœ… Created dllama_tokenizer_hermes.t

So far I have:

b4rtaz@b4rtazs-MacBook-Pro examples % node chat-api-client.js
> system: You are an excellent math teacher.
> user: What is 1 + 2?
{ completion_tokens: 128, prompt_tokens: 54, total_tokens: 182 }
Ä D1Ä +Ä D2Ä isÄ theÄ sumÄ ofÄ twoÄ distances,Ä D1Ä andÄ D2.Ä ItÄ isÄ aÄ conceptÄ usedÄ inÄ geometryÄ andÄ trigonometryÄ toÄ relateÄ theÄ lengthsÄ ofÄ twoÄ sidesÄ ofÄ aÄ triangle.Ä TheÄ formulaÄ forÄ D1Ä +Ä D2Ä is:Ä D1Ä +Ä D2Ä =Ä sqrt((x2Ä -Ä x1)^2Ä +Ä (y2Ä -Ä y1)^2),Ä whereÄ (x1,Ä y1)Ä andÄ (x2,Ä y2)Ä areÄ theÄ coordinatesÄ ofÄ theÄ twoÄ points.Ä ThisÄ formulaÄ isÄ usedÄ toÄ findÄ theÄ distanceÄ betweenÄ twoÄ pointsÄ inÄ aÄ two-dimensionalÄ space.Ä DoÄ youÄ haveÄ anyÄ specificÄ questionsÄ aboutÄ thisÄ concept?Ä <|im_end

If I replace manualy Ä  => .

 D1 + D2 is the sum of two distances, D1 and D2. It is a concept used in geometry and trigonometry to relate the lengths of two sides of a triangle. The formula for D1 + D2 is: D1 + D2 = sqrt((x2 - x1)^2 + (y2 - y1)^2), where (x1, y1) and (x2, y2) are the coordinates of the two points. This formula is used to find the distance between two points in a two-dimensional space. Do you have any specific questions about this concept? <|im_end

The tokenizer is not easy part here. :)

Your definitely closer than I got, mine flat out crashed when trying to use the converted tokenizer.

I'll see what I can do to help.

b4rtaz commented 1 month ago

Ok, now after I replaced manually all Ä  => ` intokenizer.config` and executed the converter:

python3 convert-tokenizer-hf.py /Users/b4rtaz/Downloads/Hermes-2-Theta-Llama-3-8B hermes
ā­ Found chat template:

{{bos_token}}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}

ā­ To create the tokenizer file you need to manually specify chat template values. Enter \n for new line.
ā© Enter value for chat template key "chat_message_start":

ā© Enter value for chat template key "chat_role_start":
<|im_start|>
ā© Enter value for chat template key "chat_role_end":
\n
ā© Enter value for chat template key "chat_message_end":
<|im_end|>\n
ā© Enter value for chat template key "chat_generation_prompt":
<|im_start|>assistant\n
ā© Enter value for chat template key "chat_extra_stop":
<|im_start|>
{'bos_id': 128000, 'eos_id': 128003, 'chat_eos_id': 128003, 'version': 0, 'vocab_size': 128256, 'max_token_length': 192, 'chat_template': 6}
{'chat_message_start': '', 'chat_role_start': '<|im_start|>', 'chat_role_end': '\n', 'chat_message_end': '<|im_end|>\n', 'chat_generation_prompt': '<|im_start|>assistant\n', 'chat_extra_stop': '<|im_start|>'}
āœ… Created dllama_tokenizer_hermes.t

It seems the Hermes 2 works quite good.

image
DifferentialityDevelopment commented 1 month ago

Awesome stuff @b4rtaz!