-
var chatMessages= new ChatMessage[] {
ChatMessage.CreateUserMessage(cont)
}
**int tokenCount=_tokenizer (chatMessages)_; //**
ChatCompletion completion = await _sdk.GetChatClie…
-
目前的tokenizer都与之前的不一样了(vocab里缺少了id 3-13, 新增了许多added_tokens),是有什么特别理由吗?
例如:
https://huggingface.co/01-ai/Yi-1.5-34B-Chat/blob/main/tokenizer.json
https://huggingface.co/01-ai/Yi-1.5-34B-32K/blob/ma…
-
Hi, I want to ask, what are the values of self.v_token_id = 15167, self.q_token_id = 16492, self.a_token_id = 22550, self.nl_id = 13 in tokenizer set based on? Or why is the value of v_token_id set …
-
Hi there, nice work on the internVL! We're really impressed by the new internvl-v1.5.
One thing we noticed is that the backing language model internlm/internlm2-chat-20b has a fast tokenizer (https…
-
### Describe the bug
I'm trying to port an AllenNLP model to a framework that's still maintained so am considering `flair`. My original model is a character LSTM based tagger. It's character based …
-
Hi, I'm confused about where to find the tokenizer:
--tokenizer_path checkpoints/lit-llama/tokenizer.model
Referring here to the readme:
![image](https://github.com/Lightning-AI/lit-llama/ass…
-
-
Thanks for the great work!
Can you kindly provide the code for training the video tokenizer?
-
### Description :
Training a Bilingual tokenizer with 32k vocab size and low fertility score using sentencepiece library
### To dos:
- [x] Train a tokenizer with the existing MT datas
- [x] Creat…
-
Hi,
I'm interested in contributing to implementing the BPE tokenizer.
Since we're using gpt-2 encoding (as shown in the preprocessors), I think we can use the original implementation of `tiktoke…