Improve Longest Token Matching algorithm

General note 6 years later, improving the recognition algorithm (and ideally getting an evaluation test harness) would be the best way to bring nnexus into the world of mainstream tooling.

It's now tempting to speak of neural models for named entity recognition that can be transferred over, but math concepts continue to not have an adequate large scale dataset for supervised learning. So leveraging existing state of the art results is not as immediate as I would like.

Also, we've discussed in the past that it is a matter of simple engineering to make the precise matching algorithm a customization option in nnexus, so that we always preserve the current strategy in the code base, and let the end user decide which approach suits their needs best. (also allowing backwards compatibility). Fully on board with that for the practical direction.

dginev / nnexus

Improve Longest Token Matching algorithm #10