-
Hi,
I'm sorry for the many questions. My advisor asks me to implement this paper.
For the sake of matrix multiplication, I am wondering how you guarantee that the sequence length is the same for…
-
Hello,
I would like to use your learn_joint_bpe_and_vocab.py function to train a BPE tokenizer for Japanese.
The problem is, since Japanese's kanji script, similarly to Mandarin, doesn't separate wo…
-
I am not very clear about the training process. Cound you please solve my confusion?
What I understand is that the adversarial segmentation datasets are first generated as preprocessing, and then th…
-
Transformers: 3.0.2
Tokenizers: 0.8.1
Hi. First of all thanks for this great library. This is my first issue opening here. I am working at Loodos Tech as a NLP R&D Engineer in Turkey. We are pretr…
-
While initialization two models bpe_model and w2v model are downloaded.
`bpemb_en = BPEmb(lang="en", dim=50)`
In some cases, the w2v model is not required but only the tokenization is required. Fo…
-
Hi, thank you for providing the repository.
Could you please guide me, how should I prepare my dataset, so that I can run the experiment?
Current dataset structure is as follows:
Source langu…
-
## Environment info
- `transformers` version: 4.6.1
- Platform: Linux-5.4.109+-x86_64-with-Ubuntu-18.04-bionic
- Python version: 3.7.10
- PyTorch version (GPU?): 1.8.1+cu101 (False)
- Tensorflo…
-
# Model Setting
Version: 3
"bert_model": "beomi/kcbert-base",
"desc": "[version_3]: Proverb parsed as [wisdom] token",
"data_version": “1.0.0",
"data_name": ["example"],
"k": 11,
"lr": 0.00001,…
-
**Describe the bug**
I run the tutorial on https://pytorch.org/hub/nvidia_deeplearningexamples_waveglow/
and I got errors
`AttributeError: 'Tacotron2' object has no attribute 'text_to_sequence`
…
-
The part about tokenizing with Mecab is clear but what about the sub-word tokenization? And what if there are words found in the data used for finetuning but not found in the data used for pretraining…