您好,我下载了金融新闻那个,读取为什么会显示
'utf-8' codec can't decode bytes in position 3561-3562: invalid continuation byte?
附上读取代码
`word_embedding = True
if word_embedding:
print('Embedding...')
EMBEDDING_FILE = 'D:/sgns.financial.bigram-char'
embed_size = 300
def get_coefs(word, *arr): return word, np.asarray(arr, dtype='float32')
embeddings_index = dict(get_coefs(*o.rstrip().rsplit(' ')) for o in open(EMBEDDING_FILE, encoding='utf-8'))
word_index = tokenizer.word_index
embedding_matrix = np.zeros((len(vocab) + 1, embed_size))
for word, i in word_index.items():
embedding_vector = embeddings_index.get(word)
if embedding_vector is not None: embedding_matrix[i] = embedding_vector`
您好,我下载了金融新闻那个,读取为什么会显示 'utf-8' codec can't decode bytes in position 3561-3562: invalid continuation byte? 附上读取代码
`word_embedding = True
if word_embedding: print('Embedding...') EMBEDDING_FILE = 'D:/sgns.financial.bigram-char' embed_size = 300