anhaidgroup / py_stringmatching

A comprehensive and scalable set of string tokenizers and similarity measures in Python
https://sites.google.com/site/anhaidgroup/projects/py_stringmatching
BSD 3-Clause "New" or "Revised" License
135 stars 16 forks source link

Contribute with tokenizers #61

Open dmvieira opened 4 years ago

dmvieira commented 4 years ago

Hi, I'm looking your amazing project and see that you don't have some Deep Learning Tokenizers available. I really want to contribute with it. I tried to started a discussion on google groups, but I can't access: https://groups.google.com/forum/#!forum/py_stringmatching

Do you already have some requirements or decisions about Deep Learning Tokenizers? Can I start to contribute?

Thank you for your attention

christiemj09 commented 3 years ago

Hi @dmvieira! Thanks for being patient, and sorry for the delay. We've been incrementally ramping up dev time on these projects since September and are finally starting to turn our attention back towards features after being focused on maintenance.

At a high level, we're interested in py-stringmatching providing all sorts of different tokenizers, though the complexity of the tokenizers to be added might dictate whether they can be incorporated straightforwardly in an upcoming release in the near term, or whether they should be items to work towards on a project roadmap. Do you have links to resources on the kinds of deep learning tokenizers that could be included in py-stringmatching?

A pull request is always welcome and is instructive for pitching a proof of concept that could be developed further, even if the request isn't accepted outright. We appreciate the interest!

P.S. Right now the discussion on Google groups is dormant; GitHub issues like this one are a good place to raise questions like this.