subword-segmentation Search Results

150 results
for subword-segmentation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

johnmwu/contextual-corr-analysis #4

Would you mind sharing how to keep sequence length aligned b…

Hi, I'm sorry for the many questions. My advisor asks me to implement this paper. For the sake of matrix multiplication, I am wondering how you guarantee that the sequence length is the same for…

Superhzf updated 2 years ago
2
rsennrich/subword-nmt #103

learn_joint_bpe_and_vocab.py for Japanese

Hello, I would like to use your learn_joint_bpe_and_vocab.py function to train a BPE tokenizer for Japanese. The problem is, since Japanese's kanji script, similarly to Mandarin, doesn't separate wo…

lovodkin93 updated 2 years ago
1
dmis-lab/AdvSR #2

About the training process of the model

I am not very clear about the training process. Cound you please solve my confusion? What I understand is that the adversarial segmentation datasets are first generated as preprocessing, and then th…

Thovenfish updated 3 years ago
2
huggingface/transformers #6680

Tokenizers works different between NFD/NFKD and NFC/NFKC nor…

Transformers: 3.0.2 Tokenizers: 0.8.1 Hi. First of all thanks for this great library. This is my first issue opening here. I am working at Loodos Tech as a NLP R&D Engineer in Turkey. We are pretr…

abdullaholuk-loodos updated 3 years ago
10
bheinzerling/bpemb #28

tokenization only feature

While initialization two models bpe_model and w2v model are downloaded. `bpemb_en = BPEmb(lang="en", dim=50)` In some cases, the w2v model is not required but only the tokenization is required. Fo…

trideeprath updated 3 years ago
1
mgaido91/FBK-fairseq-ST #2

Trainig with custom dataset

Hi, thank you for providing the repository. Could you please guide me, how should I prepare my dataset, so that I can run the experiment? Current dataset structure is as follows: Source langu…

LinuxBeginner updated 3 years ago
2
huggingface/transformers #12075

Using whitespace tokenizer for training models

## Environment info - `transformers` version: 4.6.1 - Platform: Linux-5.4.109+-x86_64-with-Ubuntu-18.04-bionic - Python version: 3.7.10 - PyTorch version (GPU?): 1.8.1+cu101 (False) - Tensorflo…

neel04 updated 3 years ago
6
wisdomify/wisdomify #66

속담 파싱 개선 : 속담 어절 삭제

# Model Setting Version: 3 "bert_model": "beomi/kcbert-base", "desc": "[version_3]: Proverb parsed as [wisdom] token", "data_version": “1.0.0", "data_name": ["example"], "k": 11, "lr": 0.00001,…

teang1995 updated 3 years ago
10
NVIDIA/DeepLearningExamples #955

[tacotron2/Pytorch] 'Tacotron2' object has no attribute 'tex…

**Describe the bug** I run the tutorial on https://pytorch.org/hub/nvidia_deeplearningexamples_waveglow/ and I got errors `AttributeError: 'Tacotron2' object has no attribute 'text_to_sequence` …

Luosuu updated 3 years ago
7
cl-tohoku/bert-japanese #13

Help on using the model for finetuning

The part about tokenizing with Mecab is clear but what about the sub-word tokenization? And what if there are words found in the data used for finetuning but not found in the data used for pretraining…

wailoktam updated 3 years ago
3

上一页 1...5 6 7 8 9 10 11...15 下一页

150 results for subword-segmentation

150 results
for subword-segmentation