sentence-tokenizer Search Results

1000+ results
for sentence-tokenizer

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Knowledgator/TurboT5 #5

Why it changes pytorch version

Whu it changes pytorch version and installs different cuda on the system? This would break most peoples's environments actually, because there can be only one cuda version on the Ubuntu, and it has…

Oxi84 updated 1 month ago
1
wangyuxinwhy/uniem #111

关于huggingface方法调用

``` from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained('m3e-base/') model = AutoModel.from_pretrained('m3e-base/') model.eval() def get_sentence_embeddi…

allendred updated 10 months ago
1
languagetool-org/languagetool #5249

[en, maybe others] Sentence tokenizer doesn't handle Unicode…

This test fails: ``` List sentences = new AmericanEnglish().getSentenceTokenizer().tokenize("First sentence.\u2028Second sentence."); Assert.assertEquals(Arrays.asList("First sentence.", "S…

donnerpeter updated 1 year ago
1
Niger-Volta-LTI/iranlowo #17

Tokenizer feature

Thinking about including a tokenizer class in the project. I'm thinking the API could look like: ```python from iranlowo.tokenizer import Tokenizer text = "some text" word_tokens = Tokenizer(…

Olamyy updated 4 years ago
1
huggingface/transformers #31627

Tokenizer discard data that exceed max_length

### Feature request When use tokenizer, it truncate data to max_length, but can't just discard the data. ### Motivation Sometimes we want the sentence to be complete ### Your contribution No

fengyunflya updated 2 months ago
4
mozilla/firefox-translations-training #742

Support CJK in OpusCleaner

Nikolay: Chinese alphabet should be added. In general we can use a unicode ranges to do so, but they are somewhat complicated: https://stackoverflow.com/questions/43418812/check-whether-a-string-cont…

eu9ene updated 1 day ago
3
UKPLab/sentence-transformers #1834

Question about "train_stsbenchmark.py"

I am testing "train_stsbenchmark.py" with huggingface transformer "Rostlab/prot_t5_xl_uniref50" and get the following error. What am I missing? How do I fix it? Thanks. $ python train_stsbenchmark.…

chenchuming updated 1 month ago
3
mudler/LocalAI #2521

Transformers backend supports mps

**LocalAI version:2.16.0 **Environment, CPU architecture, OS, and Version:** mac studio M2 Ultra **Describe the bug** using backend transformers for glm4, trust_remote_code: true not c…

aotsukiqx updated 4 months ago
1
CQCL/lambeq #148

Fix transformers-related warning

Dear `lambeq` developers, I was playing around with the package and testing the parsing example in the bobcat tutorial by simply running ```python parser = BobcatParser() diagram = parser.senten…

arashmath updated 1 month ago
1
dktc2024/dktc #1

build_model

```python # DistilBERT 토크나이저 로드 tokenizer = DistilBertTokenizer.from_pretrained('monologg/distilkobert') # 데이터를 DistilBERT 입력 형식으로 변환하는 함수 정의 def convert_to_input(df, tokenizer, max_length=400):…

wjdgml0526 updated 4 months ago
2

上一页 1...5 6 7 8 9 10 11...100 下一页

1000+ results for sentence-tokenizer

1000+ results
for sentence-tokenizer