-
### Bug description
When a line that starts with too many encoded apostrophes (i.e. ') is passed as input, marian-decoder stops on it, ignoring the rest of the input. For example, giving it …
-
Is the `--sentencepiece-alphas` in Marian CLI the same as the alpha on https://github.com/google/sentencepiece/blob/master/src/bpe_model.h#L43 to support BPE dropout when called at https://github.com/…
-
Hi, First of all, thank you for your great work and nice library.
I was inspired by your work which tries to inform the NMT model "the word composition".
I'm currently doing my research on the ef…
-
As described by @eric-haibin-lin in https://github.com/google/sentencepiece/issues/335 it is currently not possible to use `SampleEncodeAsPieces`, `SampleEncodeAs{Pieces,Ids}` on a BPE model (displays…
-
Hi
I want to test Flair (and also Bert and ELMo) embeddings for NMT.
I currently use SentencePiece to segment my corpus as it significantly provides best performances over other methods
I can…
-
Following line is mentioned at the beginning of Subword regularization in README.md.
>To enable subword regularization, you would like to integrate SentencePiece library (C++/Python) into the NMT …
-
Need a few clarifications regarding how to handle rare words and heuristics in the [configuration](https://github.com/lvapeab/nmt-keras/blob/master/config.py#L70)
- How does heuristic 2 handle case…
-
You need to apply some subwords methods. Have look [here](http://forum.opennmt.net/t/using-sentencepiece-byte-pair-encoding-on-model/3027).
_Originally posted by @francoishernandez in https://githu…
-
下载了ernie tiny的config, 启动finetune_classifier时,参照reademe的说明:
# 1 线上GPU 容器环境下:
```
ERNIE tiny 模型采用了subword粒度输入,需要在数据前处理中加入切词(segmentation)并使用sentence piece进行tokenization. segmentation 以及 tokenization …
-
> It stochastically
> corrupts the segmentation procedure of BPE,
> which leads to producing multiple segmentations within the same fixed BPE framework.
> Using BPE-dropout during training and th…