sentence-tokenizer Search Results

1000+ results
for sentence-tokenizer

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

tsproisl/SoMeWeTa #6

inaccurate action word recognition

SoMeWeta uses the Tagset STTS_IBK for tagging. One of the differences between STTS and STTS_IBK is the Tag Action words (AKW), e.g. for German *lach* (Beißwenger, Bartz, Storrer und Westpfahl, 2015). …

vizzerdrix55 updated 4 years ago
1
karpathy/minGPT #140

<|endoftext|> token isn't encoded correctly

```python import torch from mingpt.bpe import BPETokenizer tokenizer = BPETokenizer() print(tokenizer("")) # tensor([[ 27, 91, 437, 1659, 5239, 91, 29]]) print(tokenizer.decode(torch.te…

ttumiel updated 3 weeks ago
2
sobjornstad/AnkiLPCG #38

A suggestion for a possible method of using this addon

First, thank you for this addon, I needed something to organise my revision process. I'd like to offer an idea. When memorising something like T.S.Eliot I find that the lines in the poem aren't su…

UrKr updated 9 months ago
1
cozek/OffensEval2020-code #1

Why add EOS token after every sentence?

Hi, Thanks a lot for sharing the code with us, interesting work! I have a question regarding tokenization for GPT-2. I've seen that you add an EOS token at the end of every sentence in each text ex…

lucacampanella updated 4 years ago
2
facebookresearch/GENRE #103

Problem in candidate-based generation on GENRE using transfo…

Since version [v4.36.0](https://github.com/huggingface/transformers/releases/tag/v4.36.0) of huggingface transformers, it is not allowed to have prefix_allowed_tokens_fn return an empty set of tokens …

y0uCeF updated 3 months ago
1
hplt-project/OpusTrainer #53

Merge sentences produces incorrect alignments when used with…

In the merge sentences modifiers, it uses whitespace tokenization: https://github.com/hplt-project/OpusTrainer/blob/9ec77d3745823f9e05016700938e6b2ffbb770e0/src/opustrainer/modifiers/merge.py#L12-L…

gregtatum updated 7 months ago
1
vllm-project/vllm #6800

[Bug]: Discrepancy in vLLM and LoRA Adapter Scores with Diff…

### Your current environment Packages used for both finetuning and inference (vllm==0.3.2): torch==2.1.2 accelerate==0.27.2 transformers==4.40.1 sentence_transformers==2.7.0 Description: …

pratcooper updated 1 day ago
1
songlab-cal/tape #126

attention masks tokenizer

Hello ! I'm trying to implement bert-base but I have not clear how do you generate the masks with the TapeTokenizer. This is my code ``` model = ProteinBertModel.from_pretrained('bert-base') tokeni…

Ch-rode updated 2 years ago
2
knowitall/nlptools #24

`Token` should not be able to have whitespace

Unfortunately, `BreezeSentencer` uses `Tokenizer.computeOffsets` to compute offsets from the resulting sentences, so simply adding `require(string.forall(!_.isWhitespace))` breaks `BreezeSentencer`.

schmmd updated 11 years ago
1
iterative/datachain #170

No "batched" Inference

I noticed a number of various things are incorrectly implemented. ```python classifier = pipeline("sentiment-analysis", device="cpu", model="distilbert/distilbert-base-uncased-fin…

michaelfeil updated 2 days ago
4

上一页 1...16 17 18 19 20 21 22...100 下一页

1000+ results for sentence-tokenizer

1000+ results
for sentence-tokenizer