sentence-tokenizer Search Results

1000+ results
for sentence-tokenizer

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

windrainmoon/lllygogogo #13

bert提取

import torch from transformers import BertForTokenClassification, BertTokenizer, AdamW # 设定设备 device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # 加载预训练的BERT模型和分词器 model_name = 'b…

windrainmoon updated 2 months ago
1
microsoft/BlingFire #113

Roberta tokenizer - first word in sentence doesn't match hug…

In the the original roberta tokenizer words are treated differently if they appear in the beginning of a sentence, i.e. they don't have a space before them: For example the following code: ``` to…

tomateb updated 3 years ago
1
themurtazanazir/vec2text #8

Reverse-direction decoding

1: it will do better, more general 2: pre-taining hurts the model.

mattf1n updated 1 day ago
4
gregdan3/sona-toki #2

Multi-sentence scorer implementation for `are_toki_pona`

NOTE: If you're reading this looking to score every sentence in a message rather than an entire message, but your use case doesn't actually require sentences, checking the entire message with `is_toki…

gregdan3 updated 2 months ago
1
microsoft/onnxruntime-extensions #785

how to register custom operator with attributes of a list?

In the provided [documents](https://onnxruntime.ai/docs/extensions/add-op.html), it showed an example of custom operator with attribute "padding_length", which has type int64. (code listed below.) Wha…

CapJunkrat updated 1 month ago
3
sloria/TextBlob #90

Advanced usage of tokenizer for sentence tokenization

I may have misunderstood the intent with the section under **Advance Usage / Tokenizers** (https://textblob.readthedocs.org/en/dev/advanced_usage.html#advanced) but I cannot get my passed in tokenizer…

nmstoker updated 9 years ago
1
huggingface/huggingface_hub #2589

OSError: Consistency check failed: file should be of size 13…

### Describe the bug I've been attempting to run new models for the sentence_transformers library today, so they're ones I've not downloaded yet. Every time the program errors at some point in the d…

csw-work updated 1 week ago
1
unslothai/unsloth #1066

Batch inference seems to have gibberish

Hello, I encountered an issue when using the unsloth library for batch inference with the LLaMA3.1 8B Instruct model. When there is a significant difference in input lengths, the output for the shorte…

jusrook updated 3 weeks ago
3
willcrichton/wordtree #2

Using tokenizer correctly

My "sentences" are long early modern book titles ingested from a single .txt files that I split by delimiter to create a list of strings. This work fine, but when I ingest this into "wordtree", it's …

MonikaBarget updated 4 weeks ago
1
microsoft/onnxruntime-genai #815

Encoding wikitext-2-raw-v1 using OGA Tokenizer hangs

**Describe the bug** Using the OGA tokenizer to encode the wikitext-2-raw-v1 hangs and does not return, but works fine for wikitest-2-v1. **To Reproduce** Steps to reproduce the behavior: import…

WA225 updated 1 month ago
4

上一页 1...3 4 5 6 7 8 9...100 下一页

1000+ results for sentence-tokenizer

1000+ results
for sentence-tokenizer