tokenization Search Results

1000+ results
for tokenization

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

BBuf/RWKV-World-HF-Tokenizer #5

Is tokenization_rwkv5.py equivalent to tokenization_rwkv_wor…

Is tokenization_rwkv5.py equivalent to tokenization_rwkv_world.py from https://huggingface.co/RWKV/v5-Eagle-7B-HF/tree/main? I saw that WordpieceTokenizer from tokenization_rwkv5.py uses whitespace_to…

shiroko98 updated 1 month ago
2
MaartenGr/BERTopic #1936

Semantic Sentence Tokenization

I'm working with a corpus that primarily consists of longer documents. I'm seeking recommendations for the most effective approach to semantically tokenize them. Examples: ``` Original Text: "I…

TheAIMagics updated 2 months ago
1
shikijs/shiki #715

Return the grammar state after tokenizing

### Clear and concise description of the problem `codeToTokens` to return the grammar state after tokenization. ### Suggested solution This could be beneficial for tokenization in the editor I'm cu…

golok727 updated 21 hours ago
2
aryan-bu/Drug-Reviews-Sentiment-Analysis #2

Tokenization output

The output of the tokenization code is too long. We dont have to show the full output. @ananyaanand0501

aryan-bu updated 4 months ago
4
weaviate/weaviate #3102

Chinese tokenization support

Hi, I want to use both chinese search and vector search with bm25.How can i set properties tokenization? It seems to not work when i set tokenization: "word"

Wohoholo updated 2 months ago
12
pkuzqh/Recoder #21

Which pkl is used for tokenization?

Hello！I found your work to be exceptionally insightful and engaging. I noticed that there are three pkls in your project, namely char_ voc.pkl, code_ voc.pkl and nl_ voc.pkl, so which file is used fo…

guoweijun137 updated 4 days ago
1
CUNY-CL/yoyodyne #162

Subword tokenization

What are people's thoughts on adding preprocessing scripts to allow BPE-like tokenization of characters? Technically we already support this (just tokenize your input and use delineation function). Bu…

bonham79 updated 4 months ago
4
mudler/LocalAI #1649

Tokenization endpoint

**Is your feature request related to a problem? Please describe.** For generative models, many are limited by a maximum number of tokens. in some workflows, the prompts are generated dynamically t…

benniekiss updated 5 months ago
1
cohere-ai/cohere-python #493

Offline tokenization produces empty token_strings

When I run the script on this doc: https://docs.cohere.com/reference/tokenize ``` response = co.tokenize(text="tokenize me! :D", model="command") ``` I get: ``` tokens=[10002, 2261, 2012, …

yifanmai updated 2 months ago
3
huggingface/text-generation-inference #1706

`/tokenize` - Optionally Apply Chat Template before Tokeniza…

### Feature request On the `/tokenize` endpoint of TGI, add an option to apply the chat template from the model's tokenizer, if existant, before tokenizing. ### Motivation The `/tokenize` e…

elsell updated 2 weeks ago
6

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for tokenization

1000+ results
for tokenization