Multi-hot encode your lists to turn them into vectors of 0s and 1s. This would mean, for instance, turning the sequence [8, 5] into a 10,000-dimensional vector that would be all 0s except for indices 8 and 5, which would be 1s.
Wouldn’t this encoding essentially reduce the sequence to a set, which means losing information? For instance, how would the encoded representation of [8, 5] differ from [5, 8] or [8, 5, 8] (or any longer sequence of 8 and 5 elements)?
(This example arises in a tutorial about classifying movie reviews by whether they’re positive or negative. To make the example concrete, if word 5 is pretty and word 8 is awful, then there’s going to be a classification difference between pretty awful and awful pretty!)
In Deep Learning with Python 2e, @fchollet says:
Wouldn’t this encoding essentially reduce the sequence to a set, which means losing information? For instance, how would the encoded representation of
[8, 5]
differ from[5, 8]
or[8, 5, 8]
(or any longer sequence of8
and5
elements)?(This example arises in a tutorial about classifying movie reviews by whether they’re positive or negative. To make the example concrete, if word
5
ispretty
and word8
isawful
, then there’s going to be a classification difference betweenpretty awful
andawful pretty
!)