subword-segmentation Search Results

150 results
for subword-segmentation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

EdinburghNLP/nematus #88

DataLossError (see above for traceback): Unable to open tabl…

Hi, I am trying to use pretrained model en-de from (http://data.statmt.org/rsennrich/wmt16_systems/ ) and translate english sentence with this script: ``` # this sample script translates a test …

simonefrancia updated 5 years ago
6
google-research/bert #133

Japanese words consist of Hiragana and Chinese characters (K…

Thank you for the great work. The tokenizer for multilingual models put whitespaces around Chinese characters (Kanji), but this treatment will unintentionally break the Japanese words consisting of…

taku910 updated 5 years ago
1
bheinzerling/bpemb #16

Encoder not splitting words into subwords

![screen shot 2018-11-26 at 14 44 39](https://user-images.githubusercontent.com/28839356/49021146-e6375680-f189-11e8-8c70-0eb0b11a0428.png) Running bpemb_en.encode is solely splitting the words by …

SamLynnEvans updated 5 years ago
2
google/sentencepiece #213

what does the second column (numeral column) mean?

Hi, I used the SentencePiece with uni-gram algorithm to achieve segmentation of protein sequence. The result is two columns data. I know the first column is subword segmentation. But what does …

guokeda updated 5 years ago
3
google/sentencepiece #211

Hyperparameter arguments in Python wrapper

This is regarding the pip package. After training the unigram model using `sentencepiece.SentencePieceTrainer.Train(train_args)`, suppose I want to sample a subword segmentation for a sentence. I a…

desh2608 updated 5 years ago
1
google/sentencepiece #224

Custom Word Boundary Sequence

It would be great if in sentencepiece the word boundary character can be chosen by the users. For example, '@@' is commonly seen in other libraries, so supporting that would help making it easier to i…

szha updated 5 years ago
4
google/sentencepiece #217

Apparent segmentation bug when defining user defined symbols

I'm not sure if this is a bug or by design, but I am experiencing some weird segmentation behaviour when using **--user_defined_symbols** to train sentencepiece. It seems that sentencepiece does …

howlinghuffy updated 5 years ago
6
marian-nmt/marian #215

deadlock in training

I train transformer model with en-fr data, I run it for several times but it seems deadlock when finish a batch at every time, log is as follow [2018-09-19 20:47:48] Training started [2018-09-19 2…

duduscript updated 5 years ago
23
openvenues/libpostal #164

Features used to train the average perceptron in the parser

Please could you tell me the features used to train the average perceptron model to parse the OSM addresses

ishaan007 updated 5 years ago
6
rsennrich/subword-nmt #55

apply_bpe gives fewer segments than before

Hi, I have been using `apply_bpe` from October 2016. I tested a recent copy of `apply_bpe` and the number of segments are significantly lower than before. I am using exactly the same settings and code…

hsajjad updated 6 years ago
2

上一页 1...9 10 11 12 13 14 15...15 下一页

150 results for subword-segmentation

150 results
for subword-segmentation