dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9k stars 1.88k forks source link

Roadmap can be read as saying we don't support text #382

Closed justinormont closed 5 years ago

justinormont commented 6 years ago

In a recent StackOverflow answer, our Roadmap was seemingly interpreted it to say that text/NLP is a purely future work item.

The roadmap currently says:

Featurization Improvements

  • Text (*)
    • Natural language text preprocessing such as tokenization, part-of-speech tagging, and sentence > breaking
    • Pre-trained text models that can be used for extracting of semantic or sentiment features from text
  • Image (*)
    • Image preprocessing such as loading, resizing, and normalization if images
    • Image featurization, including industry-standard pre-trained ImageNet neural models, such as ResNet and AlexNet

We should change the roadmap to indicate that further text/NLP techniques like pre-trained WordEmbedding models, improvements to tokenization, etc. are on the roadmap, whereas text handling, in the form of n-grams, already exists.

jwood803 commented 6 years ago

Apologies, I believe that's my fault for mentioning that in the StackOverflow answer. Thanks for clarifying, though!

I can play around with it and see if I can come up with a sample to add to the samples repository to help guide folks that those things exist.