amygdala / tensorflow-workshop

This repo contains materials for use in a TensorFlow workshop.
Apache License 2.0
633 stars 268 forks source link

preprocess.py doesn't produce expected output #55

Open walid-shalaby opened 7 years ago

walid-shalaby commented 7 years ago

When I tried to validate the output from preprocess.py against text8 as input, I found mismatch between input word sequence and the index encoded sequence which is written to text8-train.pb2. To reproduce, please add below two lines to preprocess.py after calling build_string_index() print('{}'.format(words[:100])) print('{}'.format(index[word_indices[:100]])) Here is the output I get: ['anarchism' 'originated' 'as' 'a' 'term' 'of' 'abuse' 'first' 'used' 'against' 'early' 'working' 'class' 'radicals' 'including' 'the' 'diggers' 'of' 'the' 'english' 'revolution' 'and' 'the' 'sans' 'culottes' 'of' 'the' 'french' 'revolution' 'whilst' 'the' 'term' 'is' 'still' 'used' 'in' 'a' 'pejorative' 'way' 'to' 'describe' 'any' 'act' 'that' 'used' 'violent' 'means' 'to' 'destroy' 'the' 'organization' 'of' 'society' 'it' 'has' 'also' 'been' 'taken' 'up' 'as' 'a' 'positive' 'label' 'by' 'self' 'defined' 'anarchists' 'the' 'word' 'anarchism' 'is' 'derived' 'from' 'the' 'greek' 'without' 'archons' 'ruler' 'chief' 'king' 'anarchism' 'as' 'a' 'political' 'philosophy' 'is' 'the' 'belief' 'that' 'rulers' 'are' 'unnecessary' 'and' 'should' 'be' 'abolished' 'although' 'there' 'are' 'differing'] ['instance' 'dating' 'as' 'a' 'term' 'of' 'distances' 'first' 'used' 'against' 'early' 'working' 'class' 'squid' 'including' 'the' 'hanoi' 'of' 'the' 'english' 'treaty' 'and' 'the' 'malinowski' 'UNK' 'of' 'the' 'french' 'treaty' 'afro' 'the' 'term' 'is' 'still' 'used' 'in' 'a' 'buddy' 'way' 'to' 'islam' 'any' 'act' 'that' 'used' 'zeus' 'lincoln' 'to' 'vector' 'the' 'car' 'of' 'society' 'it' 'has' 'also' 'been' 'latin' 'up' 'as' 'a' 'failed' 'eddington' 'by' 'self' 'command' 'anarchists' 'the' 'word' 'instance' 'is' 'treaty' 'from' 'the' 'born' 'without' 'mml' 'progress' 'coast' 'king' 'instance' 'as' 'a' 'political' 'culture' 'is' 'the' 'me' 'that' 'dating' 'are' 'squid' 'and' 'public' 'be' 'acceptable' 'although' 'there' 'are' 'absent']