(in this toy example, a "sample" is just a sentence, but
it could be an entire document).
samples = ['The cat sat on the mat.', 'The dog ate my homework.']
First, build an index of all tokens in the data.
token_index = {}
for sample in samples:
We simply tokenize the samples via the split method.
# in real life, we would also strip punctuation and special characters
# from the samples.
for word in sample.split():
if word not in token_index:
# Assign a unique index to each unique word
token_index[word] = len(token_index) + 1
# Note that we don't attribute index 0 to anything.
`
I don't think the code can give a unique index to each unique word.
Infect in the token_index both 'The' and 'dog' indexed to 7.
![Uploading image.png…]()
`import numpy as np
This is our initial data; one entry per "sample"
(in this toy example, a "sample" is just a sentence, but
it could be an entire document).
samples = ['The cat sat on the mat.', 'The dog ate my homework.']
First, build an index of all tokens in the data.
token_index = {} for sample in samples:
We simply tokenize the samples via the
split
method.` I don't think the code can give a unique index to each unique word. Infect in the token_index both 'The' and 'dog' indexed to 7. ![Uploading image.png…]()