When I just copy the usage_cache.py file to test how to get the embedding vectors from Elmo, I get the error like "UnicodeDecodeError: 'gbk' codec can't decode byte 0xa2 in position 0: illegal multibyte sequence".
Traceback (most recent call last):
File "c:\Users\loading\Desktop\bilm-tf-master\elmotest.py", line 31, in
vocab_file, dataset_file, options_file, weight_file, embedding_file
File "c:\Users\loading\Desktop\bilm-tf-master\bilm\model.py", line 649, in dump_bilm_embeddings
vocab = UnicodeCharsVocabulary(vocab_file, max_word_length)
File "c:\Users\loading\Desktop\bilm-tf-master\bilm\data.py", line 117, in init
super(UnicodeCharsVocabulary, self).init(filename, **kwargs)
File "c:\Users\loading\Desktop\bilm-tf-master\bilm\data.py", line 29, in init
for line in f:
UnicodeDecodeError: 'gbk' codec can't decode byte 0xff in position 0: illegal multibyte sequence
Check that your input files are correct, I accidentally used one of the binary dump files from training the initial weights for one of my inputs and it threw me incessant decode errors.
When I just copy the usage_cache.py file to test how to get the embedding vectors from Elmo, I get the error like "UnicodeDecodeError: 'gbk' codec can't decode byte 0xa2 in position 0: illegal multibyte sequence".
Traceback (most recent call last): File "c:\Users\loading\Desktop\bilm-tf-master\elmotest.py", line 31, in
vocab_file, dataset_file, options_file, weight_file, embedding_file
File "c:\Users\loading\Desktop\bilm-tf-master\bilm\model.py", line 649, in dump_bilm_embeddings
vocab = UnicodeCharsVocabulary(vocab_file, max_word_length)
File "c:\Users\loading\Desktop\bilm-tf-master\bilm\data.py", line 117, in init
super(UnicodeCharsVocabulary, self).init(filename, **kwargs)
File "c:\Users\loading\Desktop\bilm-tf-master\bilm\data.py", line 29, in init
for line in f:
UnicodeDecodeError: 'gbk' codec can't decode byte 0xff in position 0: illegal multibyte sequence