Closed mairaw closed 5 years ago
Check out the Sentiment example here: https://github.com/dotnet/machinelearning/blob/master/test/Microsoft.ML.Tests/Scenarios/SentimentPredictionTests.cs
Your use should be pretty similar.
I've read that and all of the tutorials and docs. Not really a lot of content.
I was hoping to get a bit more guidance.
Would you go about it this way or differently?
You can also have a look at GitHubIssueClassification demo in the following video at 18:00 https://www.youtube.com/watch?v=OhCysVU5RDA
@atotalnoob your approach seems like a good starting point. You would want a delimited text file with one column being the text input and the other column being the label (intent).
If you have other information that might be valuable in predicting the intent and that information is available at the time of prediction, add it in extra columns. So maybe your data will look like: SentenceToBeScore | WhatUserWasDoingBeforeStartingChat | UserType | ... | Intent.
For your LearningPipeline
, apply Dictionarizer
to Intent and call it "Label". Apply TextFeaturizer
to things like SentenceToBeScored and other text features. For categorical features like UserType, try CategoricalOneHotVectorizer
. After transforming your features to numeric vectors, apply ColumnConcatenator
so all the features are in a column called "Features". You can then add a learner (e.g. SDCA). This sample might also be useful as it shows how to work with text labels in multiclass problems.
You can try to improve your model's accuracy by modifying the hyperparameters of the transforms and learner (such as modifying the NgramLength of the TextFeaturizer
as in this sample. Your results also depend on things like the scenario (does the user's sentence actually have information about the intent) and how much data you have available to train the model.
@GalOshri Thanks for this, it definitely helps.
What would be the best ML algorithm to use (not just limited to ml.net)? SDCA?
I've been following the guide and your comments, but I can't seem to get it to train. Do you mind pointing out what I am doing incorrectly?
I keep getting an inner exception of InvalidOperationException: Source column 'Label' is required but not found
using System;
using Microsoft.ML.Models;
using Microsoft.ML.Runtime;
using Microsoft.ML.Runtime.Api;
using Microsoft.ML.Trainers;
using Microsoft.ML.Transforms;
using System.Collections.Generic;
using System.Linq;
using Microsoft.ML;
namespace nlpTest
{
class Program
{
const string dataPath = "intents.txt";
const string testPath = "testData.txt";
static void Main(string[] args)
{
var model = TrainAndPredict();
Evaluate(model);
}
public static PredictionModel<IntentData, IntentPrediction> TrainAndPredict()
{
if (!System.IO.File.Exists(dataPath))
{
Console.WriteLine("File not found " + dataPath);
}
var pipeline = new LearningPipeline();
pipeline.Add(new TextLoader<IntentData>(dataPath, useHeader: false, separator: "tab"));
pipeline.Add(new TextFeaturizer(outputColumn:"Features",inputColumns:"text"));
pipeline.Add(new Dictionarizer("Label"));
pipeline.Add(new StochasticDualCoordinateAscentClassifier());
PredictionModel<IntentData, IntentPrediction> model =
pipeline.Train<IntentData, IntentPrediction>();
IEnumerable<IntentData> intents = new[]
{
new IntentData
{
text = "I like pie",
intent = "food"
},
new IntentData
{
text = "I like pizza",
intent = "food"
},
new IntentData
{
text = "my favorite color is blue",
intent = "color"
},
new IntentData
{
text = "my favorite color is black",
intent = "color"
}
};
IEnumerable<IntentPrediction> predictions = model.Predict(intents);
var intentsAndPredictions = intents.Zip(predictions, (intent, prediction) => (intent, prediction));
foreach (var item in intentsAndPredictions)
{
Console.WriteLine($"Intent: {item.intent.intent} | Prediction: {item.prediction.intent}");
}
Console.WriteLine();
return model;
}
public static void Evaluate(PredictionModel<IntentData, IntentPrediction> model)
{
var testData = new TextLoader<IntentData>(testPath, useHeader: false, separator: "tab");
var evaluator = new ClassificationEvaluator();
ClassificationMetrics metrics = evaluator.Evaluate(model, testData);
Console.WriteLine();
Console.WriteLine("PredictionModel quality metrics evaluation");
Console.WriteLine("------------------------------------------");
Console.WriteLine($"confusion matrix: {metrics.ConfusionMatrix}");
}
}
public class IntentData
{
[Column(ordinal: "0")]
public string text;
[Column(ordinal: "1", name: "Label")]
public string intent;
}
public class IntentPrediction
{
[ColumnName("PredictedLabel")]
public string intent;
}
}
Contents of intents.txt (tab separated):
I like pie food
My favorite color is blue color
I like pizza food
my favorite color is black color
I like cheese food
my favorite color is purple color
@atotalnoob, can you please update your nuget (or code if you are using source). This problem was fixed in issue #121.
@zeahmed There is no newer Nuget package (see screenshot). I checked prerelease, as well. I'll try with the source and report back.
Built from source isn't working either.
I built and added references to these assemblies:
Microsoft.ml
Microsoft.ML.Api
Microsoft.ML.Core
Microsoft.ML.CpuMath
Microsoft.ML.Data
Microsoft.ML.Maml
Microsoft.ML.Transforms
Microsoft.ML.UniveralModelFormat
Different Error, same line, which is:
An unhandled exception of type 'System.InvalidOperationException' occurred in Microsoft.ML.Data.dll
Entry point 'Trainers.StochasticDualCoordinateAscentClassifier' not found
Stack trace:
" at Microsoft.ML.Runtime.EntryPoints.EntryPointNode..ctor(IHostEnvironment env, ModuleCatalog moduleCatalog, RunContext context, String id, String entryPointName, JObject inputs, JObject outputs, Boolean checkpoint, String stageId, Single cost) in C:\\machinelearning\\src\\Microsoft.ML.Data\\EntryPoints\\EntryPointNode.cs:line 509\r\n at Microsoft.ML.Runtime.EntryPoints.EntryPointNode.ValidateNodes(IHostEnvironment env, RunContext context, JArray nodes, ModuleCatalog moduleCatalog) in C:\\machinelearning\\src\\Microsoft.ML.Data\\EntryPoints\\EntryPointNode.cs:line 893\r\n at Microsoft.ML.Runtime.EntryPoints.EntryPointGraph..ctor(IHostEnvironment env, ModuleCatalog moduleCatalog, JArray nodes) in C:\\machinelearning\\src\\Microsoft.ML.Data\\EntryPoints\\EntryPointNode.cs:line 968\r\n at Microsoft.ML.Runtime.Experiment.Compile() in C:\\machinelearning\\src\\Microsoft.ML\\Runtime\\Experiment\\Experiment.cs:line 56\r\n at Microsoft.ML.LearningPipeline.Train[TInput,TOutput]() in C:\\machinelearning\\src\\Microsoft.ML\\LearningPipeline.cs:line 204\r\n at nlpTool.Program.TrainAndPredict() in C:\\Users\\UserProfile\\source\\repos\\nlpTool\\nlpTool\\Program.cs:line 34\r\n at nlpTool.Program.Main(String[] args) in C:\\Users\\UserProfile\\source\\repos\\nlpTool\\nlpTool\\Program.cs:line 19"
@atotalnoob, If you are using nuget v0.1.0, please update your type as follows. The issue #121 was solved later on. I tested your code. It's working with this change.
public class IntentData
{
[Column(ordinal: "0")]
public string text;
[Column(ordinal: "1", name: "Label")]
public string Label;
}
Hey,
Still not working. New exception, so progress... Same line, occurs on
System.InvalidOperationException: 'Can't bind the IDataView column 'PredictedLabel' of type 'Key<U4, 0-1>' to field 'intent' of type 'System.String'.'
Can we make these exceptions more wordy? Like idk what the hell the exception is even complaining about.
using System;
using Microsoft.ML.Models;
using Microsoft.ML.Runtime;
using Microsoft.ML.Runtime.Api;
using Microsoft.ML.Trainers;
using Microsoft.ML.Transforms;
using System.Collections.Generic;
using System.Linq;
using Microsoft.ML;
namespace nlpTool
{
class Program
{
const string dataPath = "intents.txt";
const string testPath = "testData.txt";
static void Main(string[] args)
{
var model = TrainAndPredict();
Evaluate(model);
}
public static PredictionModel<IntentData, IntentPrediction> TrainAndPredict()
{
if (!System.IO.File.Exists(dataPath))
{
Console.WriteLine("File not found " + dataPath);
}
var pipeline = new LearningPipeline();
pipeline.Add(new TextLoader<IntentData>(dataPath, useHeader: false, separator: "tab"));
pipeline.Add(new TextFeaturizer(outputColumn:"Features",inputColumns:"text"));
pipeline.Add(new Dictionarizer("Label"));
pipeline.Add(new StochasticDualCoordinateAscentClassifier());
PredictionModel<IntentData, IntentPrediction> model =
pipeline.Train<IntentData, IntentPrediction>();
IEnumerable<IntentData> intents = new[]
{
new IntentData
{
text = "I like pie",
Label = "food"
},
new IntentData
{
text = "I like pizza",
Label = "food"
},
new IntentData
{
text = "my favorite color is blue",
Label = "color"
},
new IntentData
{
text = "my favorite color is black",
Label = "color"
}
};
IEnumerable<IntentPrediction> predictions = model.Predict(intents);
var intentsAndPredictions = intents.Zip(predictions, (intent, prediction) => (intent, prediction));
foreach (var item in intentsAndPredictions)
{
Console.WriteLine($"Intent: {item.intent.Label} | Prediction: {item.prediction.intent}");
}
Console.WriteLine();
return model;
}
public static void Evaluate(PredictionModel<IntentData, IntentPrediction> model)
{
var testData = new TextLoader<IntentData>(testPath, useHeader: false, separator: "tab");
var evaluator = new ClassificationEvaluator();
ClassificationMetrics metrics = evaluator.Evaluate(model, testData);
Console.WriteLine();
Console.WriteLine("PredictionModel quality metrics evaluation");
Console.WriteLine("------------------------------------------");
Console.WriteLine($"confusion matrix: {metrics.ConfusionMatrix}");
}
}
public class IntentData
{
[Column(ordinal: "0")]
public string text;
[Column(ordinal: "1", name: "Label")]
public string Label;
}
public class IntentPrediction
{
[ColumnName("PredictedLabel")]
public string intent;
}
}
I used this as a starting point for intent analysis, and did my best to make a cleaned up version of your example. Intent Analysis Example This works for me with the latest version in nuget. @atotalnoob
@Sorrien, Yes your code is running but it Always return food label ?? Any idea ?
The example data probably isn't enough to get a properly trained model. Try adding more examples.
On Tue, May 29, 2018, 4:47 PM Sébastien BIAUDET notifications@github.com wrote:
@Sorrien https://github.com/Sorrien, Yes your code is running but it Always return food label ?? Any idea ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dotnet/machinelearning/issues/161#issuecomment-392938464, or mute the thread https://github.com/notifications/unsubscribe-auth/AIEc4rxCSwDTeWND-OgOAAjvejjNFDxvks5t3bPKgaJpZM4UAG5F .
I found,
in the program.cs, input is putting in label property instead of text property.
now it's working
Whoops, I'll update that in the repo (updated now)
On Wed, May 30, 2018, 4:23 AM Sébastien BIAUDET notifications@github.com wrote:
I found,
in the program.cs, input is putting in label property instead of text property.
now it's working
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dotnet/machinelearning/issues/161#issuecomment-393074388, or mute the thread https://github.com/notifications/unsubscribe-auth/AIEc4l6RVGA_-_UEQouz-LwtuE6dXUE7ks5t3lcEgaJpZM4UAG5F .
Hi, I've made a project to help building up a chatbot platform in C#. Welcome to try the repo.
Please using NimbusML for training ML.NEt models in python and then you can use ML.NET for scoring in your dotnet apps.
Interesting and helpful!
I have taken and added to your DS:
Text, Label, "I like pie", "food", "I like pizza", "food", "my favorite color is blue", "color", "my favorite color is black", "color", "green is a cool color", "color", "I am hungry, I want sausage's", "food",
Running this on ML.NET, it worked first time. I saved the data as: Input.csv
Thanks for this!
@atotalnoob commented on Tue May 15 2018
Hey all,
What would need to be done to make ML.net do NLP/NLU? We use a python back-end for our current chatbot platform, looking to explore ML.net, because we use .net front-end.
My understanding of what needs to be done is:
Load in a dataset with 2 columns using TextLoader.
SentenceToBeScored | Intent
Then use a TextFeaturizer to change intents into numeric vectors
Then train and predict.
Is it that simple? Or am I missing something?