However, most of the triples in your data set use actually | (a pipe surrounded by spaces) as a separator, and I don't think you strip spaces around words in WordOccurrence.
In fact, both solutions work because after splitting into three strings, I apply the function Tokenize.Tokenize() to these three strings and this function Tokenize() ingores the spaces.
I'm not sure this is relevant, but you seem to be splitting on the character
|
: https://github.com/ProjetPP/PPP-QuestionParsing-ML/blob/883151456361b076f6a4b7d48105e6e956b2c2cd/src/com/ppp/ClassificationOutput.java#L19However, most of the triples in your data set use actually
|
(a pipe surrounded by spaces) as a separator, and I don't think you strip spaces around words in WordOccurrence.Note: may be related to #5