dotnet / machinelearning-modelbuilder

Simple UI tool to build custom machine learning models.
Creative Commons Attribution 4.0 International
263 stars 56 forks source link

Next Word Predictions Model #955

Open osheajohn opened 4 years ago

osheajohn commented 4 years ago

I would like to create a machine learning algorithm that predicts the next word in a sentence given the preceding words and context. I haven't found any features similar to this in your sample library. Do you have any suggestions for how to accomplish this using the ML.NET model builder?

Thanks, Jack

JakeRadMSFT commented 4 years ago

@justinormont thoughts on how they could do this w/ ML.NET? This came up in our community stand up too.

justinormont commented 4 years ago

Transformer models like BERT/GPT2 are the common way people do this currently. There is no BERT model in ML․NET currently, though there's an external project which integrates it: https://github.com/GerjanVlot/BERT-ML.NET/tree/master/Microsoft.ML.Models.BERT

I haven't tried it. I'm uncertain if it can be refitted to your dataset, or if it's just a static BERT model.

You should be able to use it to predict the next N words by placing mask tokens at the end of your phrase.

You could also modify @GerjanVlot code to use a GPT2 model, which is larger and will give better predictions for the next word. Though I don't see a license file.

You can play with various phrase completion models on Hugging Face's site: https://transformer.huggingface.co/

Gigabyte0x1337 commented 4 years ago

I have added a license you can use the code how ever you like.

justinormont commented 4 years ago

@GerjanVlot: Thanks for adding a license. I might recommend a MIT license to match the ML․NET license.

justinormont commented 3 years ago

@GerjanVlot: Thanks for switching to a MIT license!