`import numpy as np

This is our initial data; one entry per "sample"

(in this toy example, a "sample" is just a sentence, but

it could be an entire document).

samples = ['The cat sat on the mat.', 'The dog ate my homework.']

First, build an index of all tokens in the data.

token_index = {} for sample in samples:

We simply tokenize the samples via the `split` method.

# in real life, we would also strip punctuation and special characters
# from the samples.
for word in sample.split():
    if word not in token_index:
        # Assign a unique index to each unique word
        token_index[word] = len(token_index) + 1
        # Note that we don't attribute index 0 to anything.

` I don't think the code can give a unique index to each unique word. Infect in the token_index both 'The' and 'dog' indexed to 7. ![Uploading image.png…]()

fchollet / deep-learning-with-python-notebooks

I find a problem in 6.1-one-hot-encoding-of-words-or-characters #129

This is our initial data; one entry per "sample"

(in this toy example, a "sample" is just a sentence, but

it could be an entire document).

First, build an index of all tokens in the data.

We simply tokenize the samples via the `split` method.

fchollet / deep-learning-with-python-notebooks

I find a problem in 6.1-one-hot-encoding-of-words-or-characters #129

This is our initial data; one entry per "sample"

(in this toy example, a "sample" is just a sentence, but

it could be an entire document).

First, build an index of all tokens in the data.

We simply tokenize the samples via the split method.

We simply tokenize the samples via the `split` method.