Splitting of words - Githubissues

ProjetPP / PPP-QuestionParsing-ML

A more efficient reimplementation of PPP-QuestionParsing-ML-Standalone

2 stars 0 forks source link

Splitting of words #7

Closed progval closed 9 years ago

progval commented 9 years ago

I'm not sure this is relevant, but you seem to be splitting on the character |: https://github.com/ProjetPP/PPP-QuestionParsing-ML/blob/883151456361b076f6a4b7d48105e6e956b2c2cd/src/com/ppp/ClassificationOutput.java#L19

However, most of the triples in your data set use actually | (a pipe surrounded by spaces) as a separator, and I don't think you strip spaces around words in WordOccurrence.

Note: may be related to #5

robocop commented 9 years ago

In fact, both solutions work because after splitting into three strings, I apply the function Tokenize.Tokenize() to these three strings and this function Tokenize() ingores the spaces.