-
Hi team,
I would like to request some support for adding additional features for T5Tokenizer / SentencepieceTokenizer. I was able to convert the HuggingFace T5 Tokenizer to Onnx format using the follo…
-
## 🐛 Bug
Hi, I tried to test nllb for translating some English sentences to Chinese, and all my sentences are less than 60 tokens. However, most of sentences which more than 30 tokens cannot be gen…
-
I have a fine-tuned SentenceTransfomers model based on "sentence-transformers/all-mpnet-base-v2" which I'm trying to distill - but it fails with a KeyError: 'special_tokens'" at https://github.com/Min…
-
Hi Oliver,
Your library is like the gift that keeps on giving. Thank you again for it. I noticed that model tends to predict a sentence ending punctuation mark at the end of the input text even if it…
-
第十章:翻译任务中提到的默认分词器编码设定采用的上下文管理器 `as_target_tokenizer()` 即将被废弃
> 默认情况下分词器会采用源语言的设定来编码文本,要编码目标语言则需要通过上下文管理器 `as_target_tokenizer()`:
> ```
> zh_sentence = train_data[0]["chinese"]
> en_sentence = …
-
### Issue Description
I followed the sample in this article: [Emotion classification multiclass example](https://shap.readthedocs.io/en/latest/example_notebooks/text_examples/sentiment_analysis/Emo…
-
Currently I need to load a tokenizer from HuggingFace, and use it for simply encoding and decoding sentences. While doing that from Transformers.jl interface is awkward already (I had to go `tok = Tra…
-
### System Info
```shell
optimum==1.23.1
transformers==4.43.4
onnxruntime-gpu==1.19.2
sentence-transformers==3.2.0
Windows
Python 3.11.6
```
### Who can help?
@michaelbenayoun
…
-
Not entirely sure how you train the sentence and word tokenizer but I guess you're substituting numerical values with something like `##number##` in your corpus. As a consequence, when the sentence is…
-
I use [NLTK](http://nltk.org/) to tokenize text into sentences & words. But that's big package. Maybe something smaller would be better. Something like https://bitbucket.org/trebor74hr/text-sentence/o…