Better Text Modeling - Githubissues

Currently, TC's text capabilities are limited to using logistic regression on top of BOW encoded text.

While this is suitable for some cases, many use cases require more sophisticated/modern NLP methods. It would be nice to for TC have more support for modeling, classifying, and generating text with such, and then serializing results into CoreML models.

Some potential pathways this could take:

more preprocessing support (stemming, lemmatization, tokenization ...)
leveraging pretrained word embeddings (Word2Vec, GloVe, FastText...)
using representations from highly trained neural networks (BERT, GPT, ...)
support for more common tasks (tagging, sentiment analysis, labeling, sentence generation, ...)

c.f. https://github.com/huggingface/swift-coreml-transformers

apple / turicreate

Better Text Modeling #3288