dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
8.92k stars 1.86k forks source link

Problem with IMonitor with Microsoft.ML.AutoML Version 0.21.0 and 0.21.1 #6979

Closed boneatjp closed 4 months ago

boneatjp commented 5 months ago

System Information (please complete the following information):

Describe the bug Writing windows form application using ML.NET 3.0.1:

NuGet Microsoft.ML Version 3.0.1 Microsoft.ML.AutoML Version 0.20.1 Microsoft.ML.CpuMath Version 3.0.1 Microsoft.ML.DataView Version 3.0.1 Microsoft.ML.FastTree Version 3.0.1 Microsoft.ML.LightGbm Version 3.0.1 Microsoft.ML.Mkl.Components Version 3.0.1 Microsoft.ML.Mkl.Redist Version 3.0.1

To Reproduce Installing Microsoft.ML.AutoML Version 0.21.0 or 0.21.1, my Monitor class does not behave the way it does with Microsoft.ML.AutoML Version 0.20.1.

// this class display trial information
public class AutoMLMonitor : IMonitor
{
    public AutoMLMonitor()
    {
        :
    }
    public void ReportBestTrial(TrialResult result)
    {
        :
    }
    public void ReportCompletedTrial(TrialResult result)
    {
        :
    }
    public void ReportFailTrial(TrialSettings settings, Exception exception = null)
    {
        :
    }
    public void ReportRunningTrial(TrialSettings settings)
    {
        :
    }
}

// to cancel the experiment
CancellationTokenSource cts;
private void btnCancel_Click(object sender, EventArgs e)
{
    cts.Cancel();
}

// when running
private async void ExecAutoML()
{
    // Set neccesarry options
    :
    //
    AutoMLExperiment experiment = mlContext.Auto().CreateExperiment();
    var monitor = new AutoMLMonitor();
    experiment.SetMonitor(monitor);
    cts = new CancellationTokenSource();
    TrialResult experimentResults = await experiment.RunAsync(cts.Token);
}

Expected behavior By clicking btnCancel, experiment stops. It works fine with version 0.20.1, but would not work with version 0.21.0 or 0.21.1.

michaelgsharp commented 5 months ago

@LittleLittleCloud can you take a look at this?

boneatjp commented 5 months ago

Do you mean if I could look at the site "@LittleLittleCloud"? If I click the link, it shows "LittleLittleCloud (Xiaoyun Zhang)/ January 2024" and what should I take a look at? Sorry, I don't get the point you mean.

LittleLittleCloud commented 5 months ago

@boneatjp He's asking me to take a look

LittleLittleCloud commented 5 months ago

@boneatjp Can you share the log from MLContext? You can get log from context by attaching a event listner

MLContext context;

context.Log += (o, e) => {
 Console.WriteLine(e)
}
boneatjp commented 5 months ago

Since it's windows form application, I changed as following:

string logContext = "";
contxt.Log += (o, e) => {
    logContext += e + Environmet.NewLine;
};

and after finishing RunAsync method,

File.AppendAllText("logContext.txt", logContext); the file "logContextWv0201.txt" is when running Microsoft.ML.AutoML version 0.20.1 and the file "logContextWv0211.txt" is when running Microsoft.ML.AutoML version 0.21.1.

I'm not sure if I'm doing the way you wanted or not. The log shows so many "Microsoft.ML.LoggingEventArgs" logContextWv0201.txt logContextWv0211.txt

LittleLittleCloud commented 5 months ago

@boneatjp Can you print the message instead?

string logContext = "";
contxt.Log += (o, e) => {
    logContext += e.Message + Environmet.NewLine;
};
boneatjp commented 5 months ago

logContextWv0201.txt logContextWv0211.txt

These are the modified logs.

LittleLittleCloud commented 5 months ago

@boneatjp

Thanks.

It seems that in both log files, the cancellation token is invoked. In the first log file, the trainer for the current running trial is SDCA, while in the second log file, the trainer for the running trial is LightGBM.

In that situation, it's somehow expected that the cancellation "not work" for the second situation. This is because SDCA trainer is implemented in managed code, and during training it will check cancellation token periodically and pause traininng when the token get cancelled. However, LightGBM trainer is implemented in native code, so the cancellation token can only be checked once the native code execution is completed, which might make it look like the cancellation button doesn't react if the native code execution takes some time to completed.

So maybe you can disable LightGBM trainer or update the UI to present a cancelling status when cancellation btn is clicked until the current experiment get cancelled?

boneatjp commented 5 months ago

@LittleLittleCloud

Thank you for checking the logs. However, I guess I could not explain the problem I'm having at the first time. With version 0.20.1, the monitor works fine. With version 0.21.1, the monitor does not work properly. While trials are running, windows controls cannot handle events such as btnCancel_Click. Like when looping without Application.DoEvent(). I hope you could get the point; with version 0.20.1, the monitor imprements Application.DoEvent() but not with version 0.21.1.

LittleLittleCloud commented 5 months ago

@boneatjp I'm not really understand about the windows controls can't handle btnCancel_Click, Because from the log you present, both cancellation token are invoked. are you saying after updating to 0.21.1

boneatjp commented 4 months ago

@LittleLittleCloud I think the logs show the event that terminating by running out of time I set to 5 minutes not from btnCancel_Click. I'm saying that after updating to 0.21.1 or 0.21.0, I cannot even click the buttun nor change the size of the application nor any other things to the application.

LittleLittleCloud commented 4 months ago

@boneatjp are you saying your app deadlocked when clicking training btn after updating AutoML?

boneatjp commented 4 months ago

@LittleLittleCloud I'm not sure if "it's deadlocked" is the right way of explainning it, but that how it is so I leave it until it finishes training.

LittleLittleCloud commented 4 months ago

@boneatjp That sounds wield, could you provide a minimal reproducible example, or provide a link to the code.

boneatjp commented 4 months ago

@LittleLittleCloud I've made this project with Microsoft.ML.AutoML version 0.20.1.

AutoMLSample.zip

It should work fine, I guess. But, if you upgrade to Microsoft.ML.AutoML version 0.21.1, you should see what I mean.

LittleLittleCloud commented 4 months ago

Solution

in Form1.cs, changing the last few lines of button1_Click from

await experiment.RunAsync(cts.Token);
button1.Enabled = true;
 button2.Enabled = false;
richTextBox1.AppendText(Environment.NewLine +"Training Finished!!" + Environment.NewLine);

to

_ = Task.Run(async () => {
    await experiment.RunAsync(cts.Token);
    button1.Enabled = true;
    button2.Enabled = false;
    richTextBox1.AppendText(Environment.NewLine +"Training Finished!!" + Environment.NewLine);
});

explanation

Introduced by #6560, the RunAsync in SweepablePipelineRunner will not actually start the trial in a new task. It simply wrap the trial result in a task object using Task.FromResult.

In your code, this change means that the automl experiment will block and freeze UI thread. But the root cause is not actually in the monitor's code.

The fix is simply put the automl experiment in a new task so it won't block UI.

boneatjp commented 4 months ago

@LittleLittleCloud OK, I've modified my source code as you suggested. Then tried to run, well, it gets an error 'System.InvalidOperationException' in the IMonitor where outputting logs to RichTextBox.

Since it worked fine with version 0.20.1, there must have been changes with version 0.21.1 to use IMonitor. Are you saying that I've got change my source code to use with version 0.21.1?

I really appreciate your support showing how I could go around with version 0.21.1. I guess I have to learn more writing code in C#. I've read something about accessing controls from other tasks. However, I have not understand how I could do without getting errors.

LittleLittleCloud commented 4 months ago

@boneatjp when you talk about source code, are you saying the code you shared in the zip file above, or the actual code in your project.

Since it worked fine with version 0.20.1, there must have been changes with version 0.21.1 to use IMonitor. Are you saying that I've got change my source code to use with version 0.21.1?

Yes, after the change above, I can actually run the project you share above with 0.21.1. So if the source code is the zip file you share above, that would make me confuse.

boneatjp commented 4 months ago

@LittleLittleCloud Well, I'm saying that the project I uploaded here and changed as you mensioned but I'm having an error, but you're saying you're not having any errors at all by modifying the code you mention here?

LittleLittleCloud commented 4 months ago

@boneatjp OK, maybe I miss mention some other changes I made. I push the entire project with changes to github. Maybe that would help

https://github.com/LittleLittleCloud/AutoMLWinformSample

boneatjp commented 4 months ago

@LittleLittleCloud Thank you so much! I've fixed my app to run with version 0.21.1.

Though the project you pushed to github still had the same error I was getting which was due to the problem accessing controls from other tasks, I've managed using Control.Invoke in IMonitor class.

Quite difference between version 0.20.1 and version 0.21.1, I think. But, I'm so grad I could use version 0.21.1 with your help.

LittleLittleCloud commented 4 months ago

Cool, glad you figured out! Pls let us know if you need any further help.