tokenize Search Results

1000+ results
for tokenize

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

ShivamSarodia/ShivyC #59

Simplify the tokenize_line function

The `tokenize_line` lexer function is long and becoming difficult to maintain/read. It should be broken up into multiple parts, perhaps as part of a refactor of the lexer as a whole.

ShivamSarodia updated 5 years ago
2
opensearch-project/ml-commons #3170

[RFC] Implement register custom sparse tokenizer from local …

## Background **[Neural Sparse](https://opensearch.org/docs/latest/search-plugins/neural-sparse-search/)** is a semantic search method which is built on native Lucene inverted index. The documents…

zhichao-aws updated 6 days ago
13
Jaesu26/textmentations #36

형태소 분석 로직 수정

성능 향상을 위해 insert_synonyms, replace_synonyms 함수에 형태소 분석 기능을 추가했음 형태소 분석을 어절 단위로 수행하는데 이렇게 하지 말고 문장 단위로 형태소 분석을 수행하도록 변경하자 장점: 1. kiwi.tokenize 메서드 호출 횟수 감소에 따른 처리 속도 향상 2. tokenize를 문장 단위로 해야 모호성…

Jaesu26 updated 3 weeks ago
1
allenai/allennlp #3823

Do `batch_tokenize` in `PretrainedTransformerTokenizer`

Given that the transformers library is including faster tokenizers that probably work faster in batches, I think we can implement `batch_tokenize` in `PretrainedTransformerTokenizer` so it calls `batc…

bryant1410 updated 4 years ago
7
dhowe/ritajs-v2 #236

Diacritical characters inconsistency with tokenize()

I quite like the (?) undocumented convention that _ (underscore) allows input text to generate tokens containing spaces, but there is inconsistency when the string preceding the underscore contains a …

shadoof updated 2 years ago
5
VE-FORBRYDERNE/mtj-softtuner #14

Issue with trainer.tokenize_dataset

When I try to use this notebook in its google colab implementation I am able to run it down to where you make the npy file. However, when I try to run that block I get the following error. Any thought…

Psmallwood217 updated 1 year ago
1
JuliaLang/JuliaSyntax.jl #509

Macros juxtaposed with strings incorrectly(?) get raw string…

If you directly juxtapose a macro with a string, the macro is supplied a raw literal version of the string **as if the macro were a string macro,** even though it's not. Here is an example: ```jul…

NHDaly updated 3 weeks ago
4
facebookresearch/spiritlm #9

speech tokenizer training code

This project is demonstrated a very good way to tokenize a speech with different feature, such as style and pitch tokens, that enable downstream application having fine grained control of the generati…

indiejoseph updated 1 week ago
2
abetlen/llama-cpp-python #1468

model.tokenize result is different from tranformers tokenize…

model.tokenize result is different from tranformers tokenizer result using the same mode Qwen1.5-7B input_ids1 = model.tokenize(prompt.encode("utf-8")) input_ids2 = tokenizer([prompt], padding…

Liufeiran123 updated 5 months ago
1
Beu-Wolf/Redes19_project1 #27

Maybe ignore newlines in tokenize?

tosmarcel updated 5 years ago
1

上一页 1...8 9 10 11 12 13 14...100 下一页

1000+ results for tokenize

1000+ results
for tokenize