Closed NiPersson closed 9 months ago
Have you tried adding more time or use SetMaxModelToExplore
?
Yes, the only difference is that with longer max time to train/more models to explore, it takes longer time for the exception to occur. Usually close to the maxed allowed time except for when using only LightGBM. Then the exception occur with in a few seconds.
Nicklas
Den tis 23 maj 2023 22:41Xiaoyun Zhang @.***> skrev:
Have you tried adding more time or use SetMaxModelToExplore?
— Reply to this email directly, view it on GitHub https://github.com/dotnet/machinelearning/issues/6706#issuecomment-1560096464, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARQQHZJE32WSSJJCVCGKUZ3XHUOH7ANCNFSM6AAAAAAYL2UFPU . You are receiving this because you authored the thread.Message ID: @.***>
Your code runs well on my end
I'm targeting to the latest AutoML package though, but 0.20.1
should also work. Can you share your stacktrace if available.
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<RootNamespace>ConsoleApp1</RootNamespace>
<TargetFramework>net6.0</TargetFramework>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="Microsoft.ML.AutoML" Version="0.21.0-preview.23266.6" />
</ItemGroup>
</Project>
The stacktrace looks fine. It's the expected stacktrace when time's up.
So the issue you have is no trial can be run til completed no matter how much time you set for SetTrainingTime
?
Hi, sorry for the late answer. No matter the time one sets for the training this persists. What we have found is that it is perhaps thread related. The sample I supplied above works in a "clean and new project" but not where it is used in our project. In the c# code examples I have seen one usually uses:
await experiment.RunAsync();
which I guess translates to:
Dim t = experiment.RunAsync() t.Wait()
in visual basic. but this still gives the same exception.
@NiPersson How many training thread would your project kicks off when starting
We have only one thread used for training but large and complex project.
@NiPersson How about the dataset? Is the dataset still \taxi-fare-train.tsv?
@NiPersson And did you dispose or call MLContext.Cancel
anywhere in your project?
@NiPersson How about the dataset? Is the dataset still \taxi-fare-train.tsv?
I have tested with this dataset alongside what we normally use.
MLContext.Cancel
Does not RunAsync call it?
//
// Summary:
// Run experiment and return the best trial result asynchronizely. The experiment
// returns the current best trial result if there's any trial completed when ct
// get cancelled, and throws System.TimeoutException with message "Training time
// finished without completing a trial run" when no trial has completed. Another
// thing needs to notice is that this function won't immediately return after ct
// get cancelled. Instead, it will call Microsoft.ML.MLContext.CancelExecution to
// cancel all training process and wait all running trials get cancelled or completed.
public async Task
Other than that, It does not look like I do.
@NiPersson can you try disable lightGbm
and train again? Looks like the exception was thrown from lightgbm trainer. In AutoML2.0, we add a CheckAlive
checkpoint in lightGBM trainer, that might be the cause
Already tested that... All trainers produce this exception
Then my best bet without access to your project or minial reproducable example is the exception is caused by time-out. Is your project compute-instense? Have you try setting a super long time budget?
One thing I recalled now was with a "clean project", if you ran it from a thread, you also got the exception:
Task.Run(Sub() ' Call example code End Sub)
Then my best bet without access to your project or minial reproducable example is the exception is caused by time-out. Is your project compute-instense? Have you try setting a super long time budget?
I would say it is not computer intense. Before the update I used 60 secs max for training. If I don't recall wrong I tested with max 30 minutes as the longest.
One thing I recalled now was with a "clean project", if you ran it from a thread, you also got the exception:
Task.Run(Sub() ' Call example code End Sub)
Can you provide a minimal example for it? I'm not quite familiar with vb.net
I can do that. Hopefully I have time in a few hours.
Hi, I had problem with recreating it as I described above but what I did find was that this code:
Imports System
Imports System.Data
Imports System.IO
Imports Microsoft.ML
Imports Microsoft.ML.AutoML
Imports Microsoft.ML.Data
Imports Microsoft.ML.DataOperationsCatalog
Module Module1
Sub Main(args As String())
Dim MLObject As MLObjects = New MLObjects
MLObject.TrainModel()
End Sub
Public Class MLObjects
Public Sub TrainModel()
Dim dataPath As String = "C:\Aiolos\@TEST2\Data\MLNet\taxi-fare-train.csv"
Dim ctx As MLContext = New MLContext()
' Infer column information
Dim columnInference As ColumnInferenceResults = ctx.Auto().InferColumns(dataPath, labelColumnName:="fare_amount", groupColumns:=False)
' Create text loader
Dim loader As TextLoader = ctx.Data.CreateTextLoader(columnInference.TextLoaderOptions)
' Load data into IDataView
Dim data As IDataView = loader.Load(dataPath)
' Split into train (80%), validation (20%) sets
Dim trainValidationData As TrainTestData = ctx.Data.TrainTestSplit(data, testFraction:=0.2)
'Define pipeline
Dim Pipeline As SweepablePipeline = ctx.Auto().Featurizer(data, columnInformation:=columnInference.ColumnInformation).Append(ctx.Auto().Regression(labelColumnName:=columnInference.ColumnInformation.LabelColumnName))
' Create AutoML experiment
Dim experiment As AutoMLExperiment = ctx.Auto().CreateExperiment()
' Configure experiment
experiment.SetPipeline(Pipeline).SetRegressionMetric(RegressionMetric.RSquared, labelColumn:=columnInference.ColumnInformation.LabelColumnName).SetTrainingTimeInSeconds(60).SetDataset(trainValidationData)
' Run experiment
Dim experimentResults As TrialResult = experiment.Run
Dim model = experimentResults.Model
'' Run experiment
'Dim t = experiment.RunAsync
't.Wait()
'Dim experimentResult As TrialResult = t.Result
'Dim model = experimentResult.Model
End Sub
End Class
End Module
worked with a project created with Framework 4.8 but I did not make it work with a new project with .Net7 (got the above mentioned exception again).
That's an interesting find! @ericstj @JakeRadMSFT @michaelgsharp Do you know who can I reach out to for this issue?
@LittleLittleCloud can you repro as well with the latest example? If folks are suspecting some threading difference perhaps examining the process at the time the exception is thrown might reveal a stuck thread, or a blocked task? Those sort of differences can happen machine/machine (or framework/framework) if there is a race condition involved. I gave the example a try and it ran to completion for me 🤷♂️
That's an interesting find! @ericstj @JakeRadMSFT @michaelgsharp Do you know who can I reach out to for this issue?
@LittleLittleCloud Did you manage to reproduce the issue?
@NiPersson -- I just noticed from the above exchange that the stack trace you captured was in the debugger. Just double checking -- are you able to reproduce this outside the debugger? Running async/multi-threaded code that depends on timeouts can often have different behavior under the debugger since it can freeze threads. Just double checking that we're all on the same page here WRT to repro steps.
@NiPersson Unfortunately, I still can't reproduce the exception with your latest code.
@NiPersson Unfortunately, I still can't reproduce the exception with your latest code.
@LittleLittleCloud Me and a colleague tested this yesterday. For him the exception happened but rarely (2/23) with the .Net7 project and for me every time. The Framework 4.8 also misbehaved. I once got a missing LightGBM dll exception and my colleague got that exception every time. I you use visual studio, here is a visual studio project with the .Net7 project:
https://www.dropbox.com/scl/fi/ivt1ehfrdgmtl8kxuiz5p/MLTest.zip?rlkey=mt9k37qdmaomhhd9aqlgjs24c&dl=0
@NiPersson -- I just noticed from the above exchange that the stack trace you captured was in the debugger. Just double checking -- are you able to reproduce this outside the debugger? Running async/multi-threaded code that depends on timeouts can often have different behavior under the debugger since it can freeze threads. Just double checking that we're all on the same page here WRT to repro steps.
@ericstj I now tried it in release and the exception still occurs
I tested "my production code" (not the example code) with a real build on a production server and there it goes better than what I see locally. The first 13 models created had only one exception.
@NiPersson I'm using .net 6, let me try on .net 7
@LittleLittleCloud Did you manage to reproduce it?
@NiPersson I tried, still works, no luck with .net7
@LittleLittleCloud : If I have the visual studio; "Enable Just My Code" setting selected than the exception is masked in most cases for me, could that be the explanation? Also it does not seem to happen all the time for all user, my colleague as stated previously, only had the exception like on average 1/10.
I tried to run the example for quite a few times and didn’t hit an exception. It works all the time
Wondering if you can joint mlnet discord channel and ping me there? It would be easier to not debug on GitHub thread. You should be able to find the discord link on readme page
Get Outlook for iOShttps://aka.ms/o0ukef
From: NiPersson @.> Sent: Tuesday, August 15, 2023 5:09:47 AM To: dotnet/machinelearning @.> Cc: Mention @.>; Comment @.>; Subscribed @.***> Subject: Re: [dotnet/machinelearning] AutoML: 'System.OperationCanceledException' after upgrade to AutoML 0.20.1 (Issue #6706)
@LittleLittleCloudhttps://github.com/LittleLittleCloud : If I have the visual studio; "Enable Just My Code" setting selected than the exception is masked in most cases for me, could that be the explanation? Also it does not seem to happen all the time for all user, my colleague as stated previously, only had the exception like on average 1/10.
— Reply to this email directly, view it on GitHubhttps://github.com/dotnet/machinelearning/issues/6706#issuecomment-1678828458 or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEAYLOXQ7VEBBZ2NS3WQBB3XVNRIZBFKMF2HI4TJMJ2XIZLTSSBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLAVFOZQWY5LFVIZTONZTGI3TSOJVHGSG4YLNMWUWQYLTL5WGCYTFNSWHG5LCNJSWG5C7OR4XAZNMJFZXG5LFINXW23LFNZ2KM5DPOBUWG44TQKSHI6LQMWVHEZLQN5ZWS5DPOJ42K5TBNR2WLKJRGMZDAMRRGE3DNAVEOR4XAZNFNFZXG5LFUV3GC3DVMWVDCNZSGIYDAMRTGQ2YFJDUPFYGLJLMMFRGK3FFOZQWY5LFVIZTONZTGI3TSOJVHGTXI4TJM5TWK4VGMNZGKYLUMU. You are receiving this email because you were mentioned.
Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
@LittleLittleCloud Hi again was "DotNetEvolution" the name of the channel? Incase so I did not find you there. But it could just be from lack of experience due to not using Discord :)
@NiPersson Sorry should be this one https://discord.com/invite/Atpktwt8
Once you are in, You can find my id #BigMiao
@luisquintanilla maybe we need to update discord link?
Closing due to stale issue. If its still a problem please re-open/create a new issue.
After upgrading to 0.20.1 I get an "Operation was canceled" exception after a while during training. I tried tweaking the code but could not get rid of it and it persisted when I tried a simple example as shown below:
The exceptions comes at "Dim experimentResults As TrialResult = experiment.Run".