Closed aimanmutasem closed 4 years ago
It's difficult to tell what's going on here. I'm not sure what Field
is and what the difference between trg
and predicted trg
is.
Can you provide a minimal working example showing your input, how exactly you call bpemb_en.encode and what output you get?
Dear @bheinzerling Thank you for your support.
I have to call vectors of pre-training model like :
SRC.build_vocab(train_data, Vectors('wiki.en.vec', url = url) , unk_init = torch.Tensor.normal_, min_freq = 2)
Do you know how I can call pre-training model using bpemb encoding?
Dear @all
I have used bpemb encoding to prevent words, but still, there are some words in the results.
`import bpemb from bpemb import BPEmb bpemb_en = BPEmb(lang="en", vs=50000)
SRC = Field(tokenize = bpemb_en.encode, init_token = '',
eos_token = '',
lower = True,
batch_first = True, fix_length = 100)
TRG = Field(tokenize = bpemb_en.encode, init_token = '',
eos_token = '',
lower = True,
batch_first = True, fix_length = 100)`
trg = ['▁apart', '▁from', '▁that', '▁there', '▁is', '▁no', '▁recommendation', '▁as', '▁to', '▁what', '▁to', '▁wear', '▁.']
predicted trg = ['▁apart', '▁from', '▁that', '▁there', '▁is', '▁no', '<unk>', '▁as', '▁to', '▁wear', '▁to', '▁wear', '▁.', '<eos>']
Do I apply bpemb encode correctly to prevent UNK words?