Closed johann-petrak closed 6 years ago
It turns out that the Mallet TokenSequence.toFeatureSequence(Alphabet) method adds index -1 entries to the feature sequence for unknown tokens, if the Alphabet is set to not growing. But then any code for converting the FeatureSequence back to String will get the ArrayOutOfBoundsException.
Not sure how to best deal with this. Ideally there would be a way to just add the known tokens to the feature sequence. See https://github.com/mimno/Mallet/issues/138
For now will just construct the feature sequence manually.
Exception is
This happens when the feature sequence that was created from a new document which was not in the training set gets converted back to string. An index gets looked up in the alphabet using lookupObject(idx) and that index is not in the alphabet, for some reason. So how did it get into the feature sequence in the first place?