subword-segmentation Search Results

150 results
for subword-segmentation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

CodedotAl/gpt-code-clippy #6

**Code Tokenization**

- [ ] What sort of tokenization will be done? - [ ] Scripts/tutorials that can do the tokenization? - [ ] Modified/newly created tokenization script to feed into the rest of the pipeline

ncoop57 updated 3 years ago
3
tensorflow/tensor2tensor #850

[question] - using custom vocabulary

Are there any helpful posts or requirements into how to use tensor2tensor with a custom vocabulary? It's for a translation problem. For example, do we need to include and as the first two lines …

jestjest updated 3 years ago
9
Zenglinxiao/Tokenizer #1

Tokenizer with SentencePiece

Currently, we use SentencePiece in Tokenizer for our models contain ZH/JA in which no space serves as a natural word boundary. The SentencePiece model is applied after Tokenizer's `none` mode. `node…

Zenglinxiao updated 3 years ago
7
Helsinki-NLP/OPUS-CAT #50

Garbage in translation (http & Trados & OmegaT)

Hello, I'm getting "@ @ " garbage in the translations. ENVIRONMENT: - win7 ult english - OpusCAT v1.2.0.0 (tested v1 & v1.2.3 as well) - OmegaT 5.7.1 - Trados 2021 (& tested sr2) - Firefox E…

claude-ws01 updated 1 year ago
5
Jingjing-NLP/VOLT #20

What exactly is V_S[t]?

From my understanding of the paper, S[t] for a particular value of t is the vocab size, so V_S[t] set of all the possible vocabularies of a given corpus of size t. i.e. for a given element v in V_S[t …

kirefu updated 3 years ago
9
tensorflow/tensor2tensor #732

*help* Skipping long sentences in the t2t-decoder possible?

Hi, First of all, many thanks for making this awesome tool available! I managed to create a translation model, using the transformer_base problem, and own data. My aim is to translate a set of docu…

e-lectrix updated 4 years ago
5
iLanguage/ilanguagelab #5

Collect Inuktitut References

``` Purpose of addition of this task: Research When reviewing task, please focus on: Recent articles on Inuktitut morphology, standard Inuktitut grammars After the review, please add a xxx to yyy wi…

GoogleCodeExporter updated 9 years ago
12
tensorflow/tensor2tensor #740

*help* transformer adding gibberish at the end of the line d…

Hi there, I have trained a transformer that is giving very good and precise results in most of the sentences for my problem (Sanskrit word segmentation). However in about 10% of the sentences it s…

sebastian-nehrdich updated 6 years ago
6
UniversalDependencies/docs #986

Auxiliaries in Japanese and Chinese

The definition of [AUX](https://universaldependencies.org/u/pos/AUX_.html) in UD is "An auxiliary is a function **word** that accompanies the lexical verb of a verb phrase and expresses grammatical di…

rafael75012 updated 4 months ago
7
yl4579/StyleTTS2 #41

Awesome in english but no support for other languages - plea…

The readme makes it sound very simple: "Replace bert with xphonebert" Looking a bit closer looks like it's quite a feat to make StyleTTS2 talk in non-english languages (https://github.com/yl4579/Styl…

cmp-nct updated 3 weeks ago
83

上一页 1...1 2 3 4 5 6 7...15 下一页

150 results
for subword-segmentation

Code Tokenization

[question] - using custom vocabulary

Tokenizer with SentencePiece

Garbage in translation (http & Trados & OmegaT)

What exactly is V_S[t]?

help Skipping long sentences in the t2t-decoder possible?

Collect Inuktitut References

help transformer adding gibberish at the end of the line d…

Auxiliaries in Japanese and Chinese

Awesome in english but no support for other languages - plea…

150 results for subword-segmentation

150 results
for subword-segmentation