tokenize Search Results

1000+ results
for tokenize

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

mcxiaoxiao/annotated-transformer-Chinese #1

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in …

请问你是怎么解决“def build_vocabulary(spacy_de, spacy_en): def tokenize_de(text): return tokenize(text, spacy_de) def tokenize_en(text): return tokenize(text, spacy_en) pr…

hackhaye updated 3 months ago
1
jupyter-naas/awesome-notebooks #1400

SpaCy - Tokenize a text corpus

Input: plain text Model: split text into chunk Output: json

jravenel updated 1 year ago
4
ShivamSarodia/ShivyC #59

Simplify the tokenize_line function

The `tokenize_line` lexer function is long and becoming difficult to maintain/read. It should be broken up into multiple parts, perhaps as part of a refactor of the lexer as a whole.

ShivamSarodia updated 5 years ago
2
Unstructured-IO/unstructured #3549

Slow execution when checking for the same package

https://github.com/Unstructured-IO/unstructured/blob/01dbc7b4733e88efd6c1e85930c707009a2a966e/unstructured/nlp/tokenize.py#L101-L113 Should prob use the cache here instead of on tokenizers: `@lru_…

ffma-nate-rogan updated 2 weeks ago
1
shibing624/ChatPDF #30

context len 同时控制了 tokenize之前的string 的len 和 tokenize之后的token…

如标题所说, context_len参数同时控制了stream_generate_answer函数中token的len: @torch.inference_mode() def stream_generate_answer( self, max_new_tokens=512, temperatur…

rjagge updated 3 months ago
3
MBall0u/atlas-simple_shell #4

tokenization function

a function specifically used to tokenize a given string. will be used for the getline tokenization and the path tokenization

MBall0u updated 1 month ago
1
mindspore-lab/mindsearch #4

wrong indent in WordpieceTokenizer.tokenize

WordpieceTokenizer.tokenize def tokenize(self, text): """Tokenizes a piece of text into its word pieces. This uses a greedy longest-match-first algorithm to perform tokeni…

whigon updated 1 year ago
1
VE-FORBRYDERNE/mtj-softtuner #14

Issue with trainer.tokenize_dataset

When I try to use this notebook in its google colab implementation I am able to run it down to where you make the npy file. However, when I try to run that block I get the following error. Any thought…

Psmallwood217 updated 1 year ago
1
ollama/ollama #3582

Add Tokenize and Detokenize Endpoints to Ollama Server

### What are you trying to do? I would like to propose the addition of tokenize and detokenize endpoints to the Ollama server. This feature is crucial for the Ollama client interfaces (such as loll…

ParisNeo updated 4 months ago
1
tsproisl/SoMaJo #19

tokenize with continue multiple punctions

BE kem pertama dalam Bahasa Melayu. 350 Pax Pemimpin daripada Malaysia, Singapura, Brunei Dan Indonesia!!! Marilah kita membawa gelombang #BEInternational ke Pasaran Melayu!!!🔥🔥🔥🔥🔥 with the text ab…

liutianling updated 3 years ago
1

上一页 1...7 8 9 10 11 12 13...100 下一页

1000+ results for tokenize

1000+ results
for tokenize