fchollet / deep-learning-with-python-notebooks

Jupyter notebooks for the code samples of the book "Deep Learning with Python"
MIT License
18.59k stars 8.63k forks source link

Chapter 11, part 2, cannot load Glove #214

Open zdjordje123 opened 1 year ago

zdjordje123 commented 1 year ago

When trying to run the code for preparing Glove word-embeddings matrix, towards the end of the notebook for chapter 11, part 2, I get an error: embedding_dim = 100

Retrieve the vocabulary indexed by our previous TextVectorization layer.

vocabulary = text_vectorization.get_vocabulary()

Use it to create a mapping from words to their index in the vocabulary.

word_index = dict(zip(vocabulary, range(len(vocabulary))))

Prepare a matrix that will be filled with the GloVe vectors.

embedding_matrix = np.zeros((max_tokens, embedding_dim)) for word, i in word_index.items(): if i < max_tokens: embedding_vector = embeddings_index.get(word)

Fill entry i in the matrix with the word vector for index i.

# Words not found in the embedding index will be all zeros.
if embedding_vector is not None:
    embedding_matrix[i] = embedding_vector

UnicodeDecodeError Traceback (most recent call last) Input In [29], in <cell line: 4>() 1 embedding_dim = 100 3 # Retrieve the vocabulary indexed by our previous TextVectorization layer. ----> 4 vocabulary = text_vectorization.get_vocabulary() 5 # Use it to create a mapping from words to their index in the vocabulary. 6 word_index = dict(zip(vocabulary, range(len(vocabulary))))

File C:\ProgramData\Anaconda3\envs\tf-gpu\lib\site-packages\keras\layers\preprocessing\text_vectorization.py:448, in TextVectorization.get_vocabulary(self, include_special_tokens) 439 def get_vocabulary(self, include_special_tokens=True): 440 """Returns the current vocabulary of the layer. 441 442 Args: (...) 446 vocabulary will not include any padding or OOV tokens. 447 """ --> 448 return self._lookup_layer.get_vocabulary(include_special_tokens)

File C:\ProgramData\Anaconda3\envs\tf-gpu\lib\site-packages\keras\layers\preprocessing\index_lookup.py:336, in IndexLookup.get_vocabulary(self, include_special_tokens) 334 keys, values = self.lookup_table.export() 335 vocab, indices = (values, keys) if self.invert else (keys, values) --> 336 vocab, indices = (self._tensor_vocab_to_numpy(vocab), indices.numpy()) 337 lookup = collections.defaultdict(lambda: self.oov_token, 338 zip(indices, vocab)) 339 vocab = [lookup[x] for x in range(self.vocabulary_size())]

File C:\ProgramData\Anaconda3\envs\tf-gpu\lib\site-packages\keras\layers\preprocessing\string_lookup.py:401, in StringLookup._tensor_vocab_to_numpy(self, vocabulary) 399 def _tensor_vocab_to_numpy(self, vocabulary): 400 vocabulary = vocabulary.numpy() --> 401 return np.array([tf.compat.as_text(x, self.encoding) for x in vocabulary])

File C:\ProgramData\Anaconda3\envs\tf-gpu\lib\site-packages\keras\layers\preprocessing\string_lookup.py:401, in (.0) 399 def _tensor_vocab_to_numpy(self, vocabulary): 400 vocabulary = vocabulary.numpy() --> 401 return np.array([tf.compat.as_text(x, self.encoding) for x in vocabulary])

File C:\ProgramData\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow\python\util\compat.py:110, in as_text(bytes_or_text, encoding) 108 return bytes_or_text 109 elif isinstance(bytes_or_text, bytes): --> 110 return bytes_or_text.decode(encoding) 111 else: 112 raise TypeError('Expected binary or unicode string, got %r' % bytes_or_text)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 0: unexpected end of data

ifond commented 1 year ago

YES!!! I have received your E-mail——Steven Lee