tokenization Search Results

1000+ results
for tokenization

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Alpha-VLLM/Lumina-mGPT #23

Possible Error in Patch Size during Tokenization

Should the patch_size here be 16 in Chameleon? https://github.com/Alpha-VLLM/Lumina-mGPT/blob/104abe453ec1acca5863698629c4db2111b0b3fc/lumina_mgpt/data/item_processor.py#L78

JeremyCJM updated 2 months ago
1
DLLXW/baby-llama2-chinese #82

ChatGLMTokenizer类

File "/qiuwkai27/cx/baby-llama2-chinese/sft.py", line 274, in tokenizer=ChatGLMTokenizer(vocab_file='./chatglm_tokenizer/tokenizer.model') File "/qiuwkai27/cx/baby-llama2-chinese/chatglm_to…

licx102359 updated 2 months ago
2
CUNY-CL/yoyodyne #162

Subword tokenization

What are people's thoughts on adding preprocessing scripts to allow BPE-like tokenization of characters? Technically we already support this (just tokenize your input and use delineation function). Bu…

bonham79 updated 9 months ago
4
huggingface/text-embeddings-inference #431

Run TEI model on CPU fails (says Cuda f16 and flash attentio…

### System Info OS: Windows 11 Rust version: cargo 1.75.0 (1d8b05cdd 2023-11-20) Hardware: CPU AMD 6800HS (text-generation-launcher --env didn't work) ### Information - [ ] Docker - [X] The CL…

Astlaan updated 4 weeks ago
1
mudler/LocalAI #1649

Tokenization endpoint

**Is your feature request related to a problem? Please describe.** For generative models, many are limited by a maximum number of tokens. in some workflows, the prompts are generated dynamically t…

benniekiss updated 10 months ago
1
markedjs/marked #3506

[Question] Consistent Behavior of End-of-Line Characters Acr…

**Marked version:** 14.1.2 ### **Background** This is not a bug, but rather a confusion from me. Consider the following text and tokenization result: ```js const token = lexer.lex('paragraph1\n'…

Bistard updated 3 weeks ago
3
AkihikoWatanabe/paper_notes #1507

LBPE: Long-token-first Tokenization to Improve Large Languag…

# URL - https://arxiv.org/abs/2411.05504 # Authors - Haoran Lian - Yizhe Xiong - Zijia Lin - Jianwei Niu - Shasha Mo - Hui Chen - Peng Liu - Guiguang Ding # Abstract - The prevalent …

AkihikoWatanabe updated 1 week ago
1
UniversalDependencies/UD_English-EWT #543

tokenization of list item markers

The tokenization of markers like "3." and "(a)" is not consistent across English treebanks. I think we've agreed to leave it alone ([previous discussion](https://github.com/UniversalDependencies/UD…

nschneid updated 4 months ago
1
hiyouga/LLaMA-Factory #5681

AttributeError: 'bool' object has no attribute '_pad'

### Reminder - [X] I have read the README and searched the existing issues. ### System Info (MindSpore) [root@fd428729b7cb46b089e3705e66eecb16-task0-0 LLaMA-Factory]# llamafactory-cli train example…

garry-jay updated 1 month ago
1
mikeizbicki/cmc-csci181-languages #52

FTS Integration with RAG Issue

Prof Izbiki we discussed before in OH how I should return the best results by using the SQLite FTS feature. I implemented this in my code but now I am having some issues searching. I reasserted online…

maxplush updated 3 days ago
1

上一页 1...2 3 4 5 6 7 8...100 下一页

1000+ results for tokenization

1000+ results
for tokenization