subword-segmentation Search Results

150 results
for subword-segmentation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

huggingface/tokenizers #317

support for subword-nmt-style glossaries?

We are currently using subword-nmt bpe tokenizer for a job and are using its "Glossary" parameters to be able ignore certain symbols using regular expressions. I understand that Tokenizers has the …

timothyjlaurent updated 4 years ago
4
JuliaText/WordTokenizers.jl #44

Add statistical tokenization algorithms

BERT and related models have been using statistical tokenization algorithms. These work well on out-of-vocab words with ML models. High-speed implementations of BPE / WordPiece etc. will be good addit…

Ayushk4 updated 3 years ago
20
OpenNMT/Tokenizer #177

vocabulary_path not work properly with support_prior_joiners…

Hello Guillaume, I came into an issue when using `vocabulary_path`. Normally, with the use of `vocabulary_path`, we would expect the output sentence does not contain vocab below a certain threshold …

Zenglinxiao updated 3 years ago
3
UniversalDependencies/docs #125

Representation of `zero morphemes' in tokenization

This may not exactly be an issue, but a question that I could not find an answer in the documentation. I hope this is the correct platform to ask such questions. I am trying to work with Turkish, and…

coltekin updated 3 years ago
31
rsennrich/subword-nmt #83

Intra-word boundary marker

Hello, I want to train vocabulary on the custom text corpora and lately to add this vocabulary to pre-trained BERT vocabulary. The thing is that pre-trained vocabulary has its intra-word boundary …

Darenar updated 4 years ago
1
ruby-rice/rice #149

Header-only testing

This tracks the testing status of #146 with existing projects. [Header-only docs](https://github.com/cfis/rice/blob/dev/README.md) Each project uses the `rice-header-only` branch Project | St…

ankane updated 3 years ago
74
moussaKam/BARThez #2

Unable to load weights from pytorch checkpoint file

Hello, I wanted to use BARThez with HuggingFace but it seems like I can't load the BARThez checkpoint. I tried to execute your HuggingFace exemple: ```python text_sentence = "Paris est la cap…

FlorianMuller updated 3 years ago
1
d-ataman/lmm #1

How to reproduce results from paper?

Hi there, I recently started going through the code in this repository after having read your paper, which I found very fascinating. I would be very interested in trying to reproduce the results…

j0ma updated 4 years ago
7
marian-nmt/marian-dev #667

marian-decoder stops on line without words

### Bug description When a line that starts with too many encoded apostrophes (i.e. &apos;) is passed as input, marian-decoder stops on it, ignoring the rest of the input. For example, giving it …

jelmervdl updated 4 years ago
5
marian-nmt/marian-dev #658

[Question] Is the sentencepiece alpha in Marian CLI the one …

Is the `--sentencepiece-alphas` in Marian CLI the same as the alpha on https://github.com/google/sentencepiece/blob/master/src/bpe_model.h#L43 to support BPE dropout when called at https://github.com/…

alvations updated 4 years ago
9

上一页 1...7 8 9 10 11 12 13...15 下一页

150 results for subword-segmentation

150 results
for subword-segmentation