I have trained the bpe model with google sentencepiece spm_train tool. when I am trying to build vocabulary with onmt_build_vocab tool, the error is raised:
# zh-ru-translator.yaml
## Where the samples will be written
save_data: run/opennmt_data
## Where the vocab(s) will be written
src_vocab: /data/translator/code/zh.vocab
tgt_vocab: /data/translator/code/ru.vocab
# Should match the vocab size for SentencePiece
src_vocab_size: 30000
tgt_vocab_size: 30000
share_vocab: False
# Corpus opts:
data:
corpus_1:
path_src: /data/translator/parallel/zh_train.txt
path_tgt: /data/translator/parallel/ru_train.txt
weight: 1
transforms: [bpe, filtertoolong]
valid:
path_src: /data/translator/parallel/zh_valid.txt
path_tgt: /data/translator/parallel/ru_valid.txt
transforms: [bpe, filtertoolong]
### Transform related opts:
#### Subword
src_subword_model: /data/translator/code/zh.model
tgt_subword_model: /data/translator/code/ru.model
#### Filter
src_seq_length: 150
tgt_seq_length: 150
does anybody faced this issue before?
PS: Previously I have tried the open net bpe version, it was too slow for me, it run about 2 days without any result.
I have trained the bpe model with google sentencepiece
spm_train
tool. when I am trying to build vocabulary withonmt_build_vocab
tool, the error is raised:The same with the ru model and then:
This is configuration:
does anybody faced this issue before?
PS: Previously I have tried the open net bpe version, it was too slow for me, it run about 2 days without any result.