question: why is multi-hot encoding appropriate for sequences?

In Deep Learning with Python 2e, @fchollet says:

Multi-hot encode your lists to turn them into vectors of 0s and 1s. This would mean, for instance, turning the sequence [8, 5] into a 10,000-dimensional vector that would be all 0s except for indices 8 and 5, which would be 1s.

Wouldn’t this encoding essentially reduce the sequence to a set, which means losing information? For instance, how would the encoded representation of [8, 5] differ from [5, 8] or [8, 5, 8] (or any longer sequence of 8 and 5 elements)?

(This example arises in a tutorial about classifying movie reviews by whether they’re positive or negative. To make the example concrete, if word 5 is pretty and word 8 is awful, then there’s going to be a classification difference between pretty awful and awful pretty!)

fchollet / deep-learning-with-python-notebooks

question: why is multi-hot encoding appropriate for sequences? #211