subword-segmentation Search Results

150 results
for subword-segmentation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

google/sentencepiece #87

Ability to avoid rare segmentation causing UNKs

When training a joint SPM model on two or more languages, is there a way to alleviate the problem of segmenting a token in language1 into subunits seen in language2 causing UNKs during test-time? I…

ozancaglayan updated 3 years ago
6
yumeng5/Spherical-Text-Embedding #1

[Question] About subwords and bpe tokenization approach

Thanks a lot for this works. According to function `ReadWord`- https://github.com/yumeng5/Spherical-Text-Embedding/blob/master/jose.c#L60 a word is defined as a sequence of characters with some d…

loretoparisi updated 3 years ago
3
lumaku/ctc-segmentation #9

What is the difference between <blank> and self-transition?

Thank you for providing this useful toolkit! I am new to it and is learning it, as I know, in ctc means the continuing the last character, then what does the self-transition mean? Can I treat them as…

houwenxin updated 3 years ago
7
google/sentencepiece #609

spm.set_random_generator_seed didnt have expected output

I tried to run the code as below: `import sentencepiece as spm` `spm.set_random_generator_seed(1)` `spm.SentencePieceTrainer.train('--input=botchan.txt --model_type=bpe --vocab_size=10000 --model_p…

YuHengKit updated 3 years ago
1
marian-nmt/marian-dev #775

scorer (sub)word alignment

The output of the `--alignment` option seems to be an alignment on subword units rather than on the tokens themselves: ``` Hello there ||| Hallo da 0-0 1-1 2-2 Hello ||| HalloHalloHalloHallo 0-…

zouharvi updated 3 years ago
4
google/sentencepiece #586

BLEU score reduced with subword regularization for low-resou…

I'm developing a transformer based NMT system for low-resource English-Sinhala translation using a parallel corpus of 54k sentences (vocab size=5k). I experimented with BPE and unigram as subword segm…

Rashmini updated 3 years ago
3
bheinzerling/bpemb #51

special tokens not handled

It seems that special tokens are not respected by BPemb. For instance, "\" gets parsed into multiple subword tokens instead of being caught and assigned the appropriate index. This is true even when i…

dunovank updated 3 years ago
2
espnet/espnet #2592

Can CTC align with token

If I use spm_train and spm_encode, i.e. sentencepiece, in ESPNET. The dictionary is subword units based or tokens. Can I use ctc_segmentation directly (like tedlium2 example)? It seems to be possib…

s920128 updated 3 years ago
14
google/sentencepiece #557

How to decide the size of subword vocabulary?

I want to implement sentencepiece BPE as my segmentation algorithm for my NMT task. My corpus size is less than 100k. Also, the source and target languages are very distant languages. - Should I us…

thilakshiK updated 3 years ago
1
PaddlePaddle/PaddleHub #1107

paddlehub 1.8 最后 multi_label_cls_task.predict 推荐的时候怎么不返回概率了

目前我是这样推荐的最后 res=multi_label_cls_task.predict(data=encoded_data, label_list=label_list) 返回的是标签 0 或者1 我想返回标签和对应的概率请问应该怎么传参文档里面找不到

wa3926 updated 3 years ago
5

上一页 1...6 7 8 9 10 11 12...15 下一页

150 results for subword-segmentation

150 results
for subword-segmentation