-
Should the patch_size here be 16 in Chameleon?
https://github.com/Alpha-VLLM/Lumina-mGPT/blob/104abe453ec1acca5863698629c4db2111b0b3fc/lumina_mgpt/data/item_processor.py#L78
-
File "/qiuwkai27/cx/baby-llama2-chinese/sft.py", line 274, in
tokenizer=ChatGLMTokenizer(vocab_file='./chatglm_tokenizer/tokenizer.model')
File "/qiuwkai27/cx/baby-llama2-chinese/chatglm_to…
-
What are people's thoughts on adding preprocessing scripts to allow BPE-like tokenization of characters? Technically we already support this (just tokenize your input and use delineation function). Bu…
-
### System Info
OS: Windows 11
Rust version: cargo 1.75.0 (1d8b05cdd 2023-11-20)
Hardware: CPU AMD 6800HS
(text-generation-launcher --env didn't work)
### Information
- [ ] Docker
- [X] The CL…
-
**Is your feature request related to a problem? Please describe.**
For generative models, many are limited by a maximum number of tokens. in some workflows, the prompts are generated dynamically t…
-
**Marked version:** 14.1.2
### **Background**
This is not a bug, but rather a confusion from me. Consider the following text and tokenization result:
```js
const token = lexer.lex('paragraph1\n'…
-
# URL
- https://arxiv.org/abs/2411.05504
# Authors
- Haoran Lian
- Yizhe Xiong
- Zijia Lin
- Jianwei Niu
- Shasha Mo
- Hui Chen
- Peng Liu
- Guiguang Ding
# Abstract
- The prevalent …
-
The tokenization of markers like "3." and "(a)" is not consistent across English treebanks.
I think we've agreed to leave it alone ([previous discussion](https://github.com/UniversalDependencies/UD…
-
### Reminder
- [X] I have read the README and searched the existing issues.
### System Info
(MindSpore) [root@fd428729b7cb46b089e3705e66eecb16-task0-0 LLaMA-Factory]# llamafactory-cli train example…
-
Prof Izbiki we discussed before in OH how I should return the best results by using the SQLite FTS feature. I implemented this in my code but now I am having some issues searching. I reasserted online…