I've partially done ParserSuite, but in process I had some questions that should be cleared before accepting (not yet ready) PR.
Maybe we should store those large strings and arrays of strings in files or serialized? That's not a performance hit since that doesn't affect runtime and that's more elegant and concise.
Second, I'd like to suggest to replace Array of stopWords with Set because lookup on Set is of effective constant time and we get distinct for free. I guess, that could improve overall performance.
And if you agree with that, then maybe I should look for more appropriate data structures in other places as well?
I've partially done ParserSuite, but in process I had some questions that should be cleared before accepting (not yet ready) PR.
stopWords
with Set because lookup on Set is of effective constant time and we getdistinct
for free. I guess, that could improve overall performance.