Open sawan16 opened 5 years ago
This obviously looks like an encoding problem, but I would need more details to know where it happens. Please report the full stack trace.
Sometimes 'utf-8' encoding faces errors while encoding/decoding certain symbols or letters. In those cases, you can either try to ignore such errors by adding errors = 'ignore'
with the encoding, or else maybe try some other specific encoding type like latin-1
or ISO-8859-1
for example. Hope this helps.
The input embed model is not in correct format. Use model.save_word2vec_format(filename) to save the fasttext or word2vec model.
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcf6' in position 0: surrogates not allowed