subword-segmentation Search Results

152 results
for subword-segmentation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

bartvm/nmt #64

Subword segmentation

I was just reading [Junyoung's paper](http://arxiv.org/abs/1603.06147) on using a character-level decoder. Although it's nice to see it works, I think the results are slightly misleading because the p…

bartvm updated 8 years ago
26
AlexisTercero55/AI-Research #14

Byte Pair Encoding on MWT 14 EN2DE

# BPE as input tokens of the transformer model The transformer model proposed by "_Attention is all you need_" encodes the 4.5M sentence input data into a small vocabulary generated by learning sha…

AlexisTercero55 updated 8 months ago
5
pytorch/text #632

[RFC] Subword models: BPE, Unigram subword models and Unicod…

## Motivation There are multiple libraries that implement subword models within the compression-based space. There is fastBPE, SentencePiece, YouTokenToMe, etc. As far as I can tell there are f…

cpuhrsch updated 4 years ago
5
facebookresearch/fastText #908

train fasttext for language Identification, use word segment…

Language identification with fasttext is great, [https://fasttext.cc/blog/2017/10/02/blog-post.html](url) But the training process is not clear, I am wondering if for language identification, subwor…

jiahuigeng updated 5 years ago
1
tensorflow/tensor2tensor #729

* feature request* subwords with a vocab of less than 5k?

Hello, I am currently trying to get a transformer going for segmentation of scripta continua languages. I noticed that decreasing the vocab_size increased the performance of the transformer in this …

sebastian-nehrdich updated 6 years ago
2
tensorflow/tensor2tensor #578

Unable to decode a pre-processed input with sub-word segment…

I have trained a base transformer model using the sub-word segmentation approach of Sennrich et al. (https://github.com/rsennrich/subword-nmt). This requires me to set the subword_tokenizer in the new…

surafelml updated 6 years ago
4
j0ma/morph-seg #6

todo: lmvr output lexicon size

LMVR modifies FlatCat and allows for an output lexicon size to be set. Since we used 3 different settings for BPE (2500, 5000, 7500), it could be worthwhile to investigate the settings for LMVR as we…

j0ma updated 4 years ago
1
marian-nmt/marian #272

How does providing a vocabulary affect training BPE models?

I am currently training a transformer model and have followed the MTM labs to apply BPE to my own corpus. However, I'm unsure of the effect that providing a pre-determined vocabulary has. Does it impa…

hc09141 updated 5 years ago
6
neulab/awesome-align #52

Is it possible align phrase by phrase, not just word by word…

For many languages, there are lots of unpaired words, but lots of paired phrase.

fishfree updated 1 year ago
4
Jingjing-NLP/VOLT #9

I found that VOLT can indeed effectively reduce the size of …

For this, I think it is possible that BPE vocabulary is too small that training corpus is overly segmented, making it harder for model training and reasoning. In our experiment, the scale of Chinese…

MarsPain updated 3 years ago
14

上一页 1...1 2 3 4 5 6 7...16 下一页

152 results for subword-segmentation

152 results
for subword-segmentation