-
I'm using jieba for tokenization for my Chinese documents, as suggested here in the issues and in the documentation. It also says in the documentation that if I use a vectorizer, I cannot use a candid…
-
### System Info / 系統信息
2024-08-27 04:33:13,423 xinference.core.supervisor 124341 INFO Xinference supervisor 0.0.0.0:18896 started
2024-08-27 04:33:13,500 xinference.core.worker 124341 INFO S…
-
Hi! Is it a mistake? There should be 17 instead of 5 in the end.
![Снимок экрана 2022-07-08 в 17 41 45](https://user-images.githubusercontent.com/33065236/178014793-4c788364-c338-43e1-abb7-d33c7c4e5c…
-
A brief analysis of the default Tokenizer shows:
```
print(tokenizer.decode(encoding_val['input_ids'][0]))
print(input_val[0])
print(output_val[0])
print(tokenizer.decode(target_encoding_val['i…
-
Moved from https://github.com/balanced/balanced-php/issues/84
balanced.js is the preferred method of tokenization. Consider warning marketplaces who use the API direct tokenization method.
-
## Expected Behavior
Compound words (e.g. pick-me-up, hand-me-down, know-it-all, etc.) should be tokenized as single tokens.
## Actual Behavior
hyphens are treated as separators, and the componen…
-
xtuner chat /root/autodl-tmp/add --prompt-template default
Traceback (most recent call last):
File "/root/ChatGLM3/xtuner/xtuner/tools/chat.py", line 491, in
main()
File "/root/ChatGLM3/xtuner/x…
-
I am getting maximum recursion depth error after running this following command:
python qlora.py --model_name_or_path decapoda-research/llama-7b-hf
And this is the error I got:
File "/home/at…
-
### Question
I got loss to be 0 when training on Qwen2 backend,
{'loss': 0.0, 'learning_rate': 0.00015267175572519084, 'epoch': 0.0} …
-
Hello,
Thank you for your hard work on this project. The tool is incredibly useful, and I appreciate your dedication.
I'd like to propose having tokenization/lexers for pattern matching along si…