fchollet / deep-learning-with-python-notebooks

Jupyter notebooks for the code samples of the book "Deep Learning with Python"
MIT License
18.17k stars 8.53k forks source link

I find a problem in 6.1-one-hot-encoding-of-words-or-characters #129

Closed WhatAboutMyStar closed 4 years ago

WhatAboutMyStar commented 4 years ago

`import numpy as np

This is our initial data; one entry per "sample"

(in this toy example, a "sample" is just a sentence, but

it could be an entire document).

samples = ['The cat sat on the mat.', 'The dog ate my homework.']

First, build an index of all tokens in the data.

token_index = {} for sample in samples:

We simply tokenize the samples via the split method.

# in real life, we would also strip punctuation and special characters
# from the samples.
for word in sample.split():
    if word not in token_index:
        # Assign a unique index to each unique word
        token_index[word] = len(token_index) + 1
        # Note that we don't attribute index 0 to anything.

` I don't think the code can give a unique index to each unique word. Infect in the token_index both 'The' and 'dog' indexed to 7. ![Uploading image.png…]()