learned-tokenization Search Results

TotalALM/VSTS-Tasks #33

Tokenization VSTS-Task escape characters

Hi, I'm using the "**Tokenization task**" and I learned that when there is special characters, the implementation will apply escape character rule on it: Example: value: Test$$!Jungle18 will be…

myaesubi updated 5 years ago

vllm-project/vllm #8779

vLLM's V1 Engine Architecture

This issues describes the high level directions that "create LLM Engine V1". We want the design to be as transparent as possible and created this issue to track progress and solicit feedback. Goal…

simon-mo updated 6 days ago

EMalpha01/Sentence_VAE #1

decoupling

SentenceVAE/ │ ├── encoder.py │ ```python │ import torch │ from torch import nn │ │ class SentenceEncoder(nn.Module): │ '''Sentence Encoder with byte-level BPE tokenization, lear…

EMalpha01 updated 2 weeks ago

speechbrain/speechbrain #2184

[Bug]: Special token IDs (BOS, EOS, etc.) not matching the t…

### Describe the bug This is a minor issue and probably not affecting the results so much. I noticed that special token IDs are often not matching the tokenizer's configuration. For example in http…

lucadellalib updated 11 months ago

octanove/shiba #12

Problem/question with `random_mask` in `masking.py`

Hey guys thanks for this awesome adaptation of CANINE 😊 I've been working on adapting for any language and I came across weird empty masks. I think the problem is in `training/masking.py` in the funct…

sven-nm updated 1 week ago

behrang/YamlSwift #13

Support serializing YAML

I took a look through the source but couldn't find a way to serialize the Yaml struct to a string. Any plans to support this?

cezheng updated 7 years ago

huggingface/transformers #10082

Supporting truncation from both ends of the sequence in Bert…

# 🚀 Feature request For `BertTokenizerFast` (inherited from `PreTrainedTokenizerFast`), it seems like `__call__` only supports truncating from the end of the sequences if we set `truncation` to be …

shangw-nvidia updated 3 years ago

irthomasthomas/undecidability #728

Transformer Models - a brief guide by Cohere

- [ ] [Transformer Models](https://docs.cohere.com/docs/transformer-models#the-softmax-layer) # Transformer Models **Description:** - **Tokenization** Tokenization is the most basic step. It consi…

irthomasthomas updated 8 months ago

huggingface/course #121

Tokenization Course Issues

Hello, I believe the corpus and the `word_freqs` output used in the [BPE](https://github.com/huggingface/course/blob/main/chapters/en/chapter6/5.mdx#implementing-bpe) / [WordPiece](https://github.c…

KeremTurgutlu updated 2 years ago

dmlc/gluon-nlp #1472

[Website] Improve website of the master version to prepare f…

## Description https://github.com/dmlc/gluon-nlp/pull/1374 has been merged so we have fixed the warnings in our documents. However, the current structure of the website is not very satisfactory and…

sxjscience updated 3 years ago

279 results for learned-tokenization

279 results
for learned-tokenization