sentence-tokenizer Search Results

1000+ results
for sentence-tokenizer

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

microsoft/onnxruntime-extensions #828

Additional support for T5 Tokenizer - SentencepieceTokenizer

Hi team, I would like to request some support for adding additional features for T5Tokenizer / SentencepieceTokenizer. I was able to convert the HuggingFace T5 Tokenizer to Onnx format using the follo…

r4ghu updated 2 days ago
3
facebookresearch/fairseq #5549

NLLB is unable to translate into a complete long sentence in…

## 🐛 Bug Hi, I tried to test nllb for translating some English sentences to Chinese, and all my sentences are less than 60 tokens. However, most of sentences which more than 30 tokens cannot be gen…

logicvv updated 2 weeks ago
1
MinishLab/model2vec #106

KeyError: 'special_tokens'"

I have a fine-tuned SentenceTransfomers model based on "sentence-transformers/all-mpnet-base-v2" which I'm trying to distill - but it fails with a KeyError: 'special_tokens'" at https://github.com/Min…

david-waterworth updated 1 day ago
3
oliverguhr/fullstop-deep-punctuation-prediction #20

Complete sentence prediction

Hi Oliver, Your library is like the gift that keeps on giving. Thank you again for it. I noticed that model tends to predict a sentence ending punctuation mark at the end of the input text even if it…

orlink updated 2 days ago
2
jsksxs360/How-to-use-Transformers #27

关于第十章：翻译任务中transformers部分API的变更

第十章：翻译任务中提到的默认分词器编码设定采用的上下文管理器 `as_target_tokenizer()` 即将被废弃 > 默认情况下分词器会采用源语言的设定来编码文本，要编码目标语言则需要通过上下文管理器 `as_target_tokenizer()`： > ``` > zh_sentence = train_data[0]["chinese"] > en_sentence = …

PikaChyou updated 1 month ago
1
shap/shap #3895

BUG: Class impact visualization on multiclass semantic class…

### Issue Description I followed the sample in this article: [Emotion classification multiclass example](https://shap.readthedocs.io/en/latest/example_notebooks/text_examples/sentiment_analysis/Emo…

CQHofsns updated 1 day ago
1
chengchingwen/Transformers.jl #199

Separate TextEncoder and Tokenizer from Transformers.jl

Currently I need to load a tokenizer from HuggingFace, and use it for simply encoding and decoding sentences. While doing that from Transformers.jl interface is awkward already (I had to go `tok = Tra…

VarLad updated 4 days ago
4
huggingface/optimum #2062

No ONNX support for BERT models when `token_type_ids` is not…

### System Info ```shell optimum==1.23.1 transformers==4.43.4 onnxruntime-gpu==1.19.2 sentence-transformers==3.2.0 Windows Python 3.11.6 ``` ### Who can help? @michaelbenayoun …

tomaarsen updated 6 days ago
1
nltk/nltk #2892

sentence tokenizer (PunktSentenceTokenizer) has difficulties…

Not entirely sure how you train the sentence and word tokenizer but I guess you're substituting numerical values with something like `##number##` in your corpus. As a consequence, when the sentence is…

janscholich updated 3 weeks ago
2
miso-belica/sumy #2

Alternative sentence tokenizer

I use [NLTK](http://nltk.org/) to tokenize text into sentences & words. But that's big package. Maybe something smaller would be better. Something like https://bitbucket.org/trebor74hr/text-sentence/o…

miso-belica updated 2 years ago
3

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for sentence-tokenizer

1000+ results
for sentence-tokenizer