-
- [ ] What sort of tokenization will be done?
- [ ] Scripts/tutorials that can do the tokenization?
- [ ] Modified/newly created tokenization script to feed into the rest of the pipeline
-
Are there any helpful posts or requirements into how to use tensor2tensor with a custom vocabulary? It's for a translation problem.
For example, do we need to include and as the first two lines …
-
Currently, we use SentencePiece in Tokenizer for our models contain ZH/JA in which no space serves as a natural word boundary.
The SentencePiece model is applied after Tokenizer's `none` mode.
`node…
-
Hello,
I'm getting "@ @ " garbage in the translations.
ENVIRONMENT:
- win7 ult english
- OpusCAT v1.2.0.0 (tested v1 & v1.2.3 as well)
- OmegaT 5.7.1
- Trados 2021 (& tested sr2)
- Firefox E…
-
From my understanding of the paper, S[t] for a particular value of t is the vocab size, so V_S[t] set of all the possible vocabularies of a given corpus of size t. i.e. for a given element v in V_S[t …
-
Hi,
First of all, many thanks for making this awesome tool available! I managed to create a translation model, using the transformer_base problem, and own data. My aim is to translate a set of docu…
-
```
Purpose of addition of this task:
Research
When reviewing task, please focus on:
Recent articles on Inuktitut morphology, standard Inuktitut grammars
After the review, please add a xxx to yyy wi…
-
Hi there,
I have trained a transformer that is giving very good and precise results in most of the sentences for my problem (Sanskrit word segmentation). However in about 10% of the sentences it s…
-
The definition of [AUX](https://universaldependencies.org/u/pos/AUX_.html) in UD is "An auxiliary is a function **word** that accompanies the lexical verb of a verb phrase and expresses grammatical di…
-
The readme makes it sound very simple: "Replace bert with xphonebert"
Looking a bit closer looks like it's quite a feat to make StyleTTS2 talk in non-english languages (https://github.com/yl4579/Styl…