Contribute with tokenizers

Hi @dmvieira! Thanks for being patient, and sorry for the delay. We've been incrementally ramping up dev time on these projects since September and are finally starting to turn our attention back towards features after being focused on maintenance.

At a high level, we're interested in py-stringmatching providing all sorts of different tokenizers, though the complexity of the tokenizers to be added might dictate whether they can be incorporated straightforwardly in an upcoming release in the near term, or whether they should be items to work towards on a project roadmap. Do you have links to resources on the kinds of deep learning tokenizers that could be included in py-stringmatching?

A pull request is always welcome and is instructive for pitching a proof of concept that could be developed further, even if the request isn't accepted outright. We appreciate the interest!

P.S. Right now the discussion on Google groups is dormant; GitHub issues like this one are a good place to raise questions like this.

anhaidgroup / py_stringmatching

Contribute with tokenizers #61