Open Polak149 opened 7 months ago
The problem is generated in *.training.cs function 'LoadIDataViewFromFile' that is loading dataset.txt without tags. I was able to workaround this problem by creating own function to train:
private class Label(string key)
{
public readonly string Key = key;
}
public static void TrainNER(string outputModelPath, string inputLabelsFilePath, string inputDataFilePath)
{
IEnumerable<Label> GetLabels(string inputLabelsFilePath)
{
var lines = File.ReadLines(inputLabelsFilePath);
return lines.Select(x => new Label(x));
}
IEnumerable<ModelInput> GetLine(string fileName)
{
using StreamReader sr = File.OpenText(fileName);
string? line;
while ((line = sr.ReadLine()) != null)
{
var split = line.Split('\t');
yield return new ModelInput()
{
Sentence = split[0],
Label = split[1..]
};
}
}
var mlContext = new MLContext();
var labels = mlContext.Data.LoadFromEnumerable(GetLabels(inputLabelsFilePath));
var dataView = mlContext.Data.LoadFromEnumerable(GetLine(inputDataFilePath));
var chain = new EstimatorChain<ITransformer>();
var estimator = chain.Append(mlContext.Transforms.Conversion.MapValueToKey("Label", keyData: labels))
.Append(mlContext.MulticlassClassification.Trainers.NamedEntityRecognition(outputColumnName: "predicted_label", batchSize: 32, maxEpochs: 10))
.Append(mlContext.Transforms.Conversion.MapKeyToValue("predicted_label"));
using var transformer = estimator.Fit(dataView);
// function automaticaly generated in *.training.cs
SaveModel(mlContext, transformer, dataView, outputModelPath);
}
@zewditu Can you take a look
System Information (please complete the following information):
Train function generated by Builder is not working for "Named Entity Recognition" and cause exception:
Using builder, a was able to generate Named Entity Recognition mlnet model. Builder generated *.training.cs file with "Train" function:
Trying to use this function cause an exception on:
Here is the example dataset i made for the sake of this post but every data set i have tried is not working: test data example.txt