dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
8.92k stars 1.86k forks source link

Microsoft.ML.Tests crashing after `[LightGBM] [Warning] bad allocation` #6961

Open ericstj opened 5 months ago

ericstj commented 5 months ago

Build Information

Build: https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=530980 Build error leg or test failing: Microsoft.ML.Tests.WorkItemExecution Rolling build

Error Message

Fill the error message using step by step known issues guidance.

{
  "ErrorMessage": "[LightGBM] [Warning] bad allocation",
  "ErrorPattern": "",
  "BuildRetry": false,
  "ExcludeConsoleLog": false
}

Failing log: https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-machinelearning-refs-heads-main-ceec46ab558849c4a3/Microsoft.ML.Tests/1/console.b4e5afef.log?helixlogtype=result

Dump: https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-machinelearning-refs-heads-main-ceec46ab558849c4a3/Microsoft.ML.Tests/1/xunit.console.exe.5364.dmp?helixlogtype=result

Sample log spew before the crash:

Starting test: Microsoft.ML.Tests.OnnxConversionTest.NonDefaultColNamesBinaryClassificationOnnxConversionTest
[LightGBM] [Warning] bad allocation
[LightGBM] [Warning] bad allocation
...

C:\h\w\B7EB099A\w\9FF809A0\e>set _commandExitCode=-1073740940

Will have a look at the dump to see if there's more information about the test that was running when this happened.

Known issue validation

Build: :mag_right: https://dev.azure.com/dnceng-public/public/_build/results?buildId=530980 Error message validated: [LightGBM] [Warning] bad allocation Result validation: :white_check_mark: Known issue matched with the provided build. Validation performed at: 1/18/2024 12:06:13 AM UTC

Report

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 0
ericstj commented 5 months ago

Stack of the crash is here:

 # Child-SP          RetAddr               Call Site
00 00000045`0d5bf920 00007ffc`3e0ceffd     lib_lightgbm!LightGBM::CreatePredictionEarlyStopInstance+0x2c9dd
01 00000045`0d5bf950 00007ffc`3e0cf1d1     lib_lightgbm!LightGBM::ObjectiveFunction::CreateObjectiveFunction+0x9ea7d
02 00000045`0d5bf980 00007ffc`3e0dafbe     lib_lightgbm!LightGBM::ObjectiveFunction::CreateObjectiveFunction+0x9ec51
03 00000045`0d5bfa10 00007ffc`64909f2a     lib_lightgbm!LGBM_DatasetPushRowsByCSR+0x36e
04 00000045`0d5bfa90 00007ffc`649017fd     vcomp140!_vcomp_fork_helper+0x6a
05 00000045`0d5bfad0 00007ffc`649017bc     vcomp140!_vcomp::fork_helper_wrapper+0x9
06 00000045`0d5bfb00 00007ffc`649088fa     vcomp140!_vcomp::ParallelRegion::HandlerThreadFunc+0x6c
07 00000045`0d5bfb70 00007ffc`69ff84d4     vcomp140!_vcomp::PersistentThreadFunc+0x5a
08 00000045`0d5bfbb0 00007ffc`6caa1791     kernel32!BaseThreadInitThunk+0x14
09 00000045`0d5bfbe0 00000000`00000000     ntdll!RtlUserThreadStart+0x21

Dump has 68 total threads.

Looking through the managed threads it seems these 4 tests are running concurrently: Microsoft.ML.Tests.OnnxConversionTest.NonDefaultColNamesBinaryClassificationOnnxConversionTest() Microsoft.ML.Tests.TrainerEstimators.TrainerEstimators.LightGBMBinaryEstimatorUnbalanced() Microsoft.ML.Tests.Scenarios.Api.CookbookSamples.CookbookSamplesDynamicApi.TextFeaturization() Microsoft.ML.Tests.Transformers.WordEmbeddingsTests.TestWordEmbeddings()

Looking at all threads with lib_lightgbm on the stack I see:

0:010> !findstack lib_lightgbm
Thread 010, 7 frame(s) match
        * 16 0000000000000001 00007ffc3df8ee31 lib_lightgbm!LightGBM::Boosting::LoadFileToBoosting+0x29d9
        * 17 0000000000000000 00007ffc3dfa2f4d lib_lightgbm!LightGBM::CreatePredictionEarlyStopInstance+0x188e1
        * 18 0000000000000000 00007ffc3e0ceffd lib_lightgbm!LightGBM::CreatePredictionEarlyStopInstance+0x2c9fd
        * 19 0000000000000000 00007ffc3e0cf1d1 lib_lightgbm!LightGBM::ObjectiveFunction::CreateObjectiveFunction+0x9ea7d
        * 20 0000000000000540 00007ffc3e0dafbe lib_lightgbm!LightGBM::ObjectiveFunction::CreateObjectiveFunction+0x9ec51
        * 21 000000457803b150 00007ffc64909f2a lib_lightgbm!LGBM_DatasetPushRowsByCSR+0x36e
        * 27 000000457803b570 00007ffc0139e433 lib_lightgbm!LGBM_DatasetPushRowsByCSR+0x19d

Thread 065, 4 frame(s) match
        * 11 0000000000000002 00007ffc3e0ceffd lib_lightgbm!LightGBM::CreatePredictionEarlyStopInstance+0x2c9dd
        * 12 0000000000000002 00007ffc3e0cf1d1 lib_lightgbm!LightGBM::ObjectiveFunction::CreateObjectiveFunction+0x9ea7d
        * 13 000000000000036d 00007ffc3e0dafbe lib_lightgbm!LightGBM::ObjectiveFunction::CreateObjectiveFunction+0x9ec51
        * 14 000000450d5bfac0 00007ffc64909f2a lib_lightgbm!LGBM_DatasetPushRowsByCSR+0x36e

Thread 066, 4 frame(s) match
        * 09 0000000000000003 00007ffc3e0ceffd lib_lightgbm!LightGBM::CreatePredictionEarlyStopInstance+0x2c9dd
        * 10 0000000000000003 00007ffc3e0cf1d1 lib_lightgbm!LightGBM::ObjectiveFunction::CreateObjectiveFunction+0x9ea7d
        * 11 000000000000082e 00007ffc3e0dafbe lib_lightgbm!LightGBM::ObjectiveFunction::CreateObjectiveFunction+0x9ec51
        * 12 000000450d6bfa80 00007ffc64909f2a lib_lightgbm!LGBM_DatasetPushRowsByCSR+0x36e

0:010> ~10s;kc;~65s;kc;~66s;kc   
ntdll!ZwAlpcSendWaitReceivePort+0x14:
00007ffc`6caf6f14 c3              ret
 # Call Site
00 ntdll!ZwAlpcSendWaitReceivePort
01 ntdll!SendMessageToWERService
02 ntdll!ReportExceptionInternal
03 ntdll!RtlReportExceptionHelper
04 ntdll!RtlReportException
05 ntdll!RtlpTerminateFailureFilter
06 ntdll!RtlReportCriticalFailure$filt$0
07 ntdll!__C_specific_handler
08 ntdll!__GSHandlerCheck_SEH
09 ntdll!RtlpExecuteHandlerForException
0a ntdll!RtlDispatchException
0b ntdll!RtlRaiseException
0c ntdll!RtlReportCriticalFailure
0d ntdll!RtlpHeapHandleError
0e ntdll!RtlpLogHeapFailure
0f ntdll!RtlpProbeUserBufferUnsafe
10 ntdll!RtlpProbeUserBuffer
11 ntdll!RtlpFreeHeapInternal
12 ntdll!RtlpHpHeapFreeRedirectLayer
13 ntdll!RtlFreeHeap
14 ucrtbase!_free_base
15 lib_lightgbm!LightGBM::Boosting::LoadFileToBoosting
16 lib_lightgbm!LightGBM::CreatePredictionEarlyStopInstance
17 lib_lightgbm!LightGBM::CreatePredictionEarlyStopInstance
18 lib_lightgbm!LightGBM::ObjectiveFunction::CreateObjectiveFunction
19 lib_lightgbm!LightGBM::ObjectiveFunction::CreateObjectiveFunction
1a lib_lightgbm!LGBM_DatasetPushRowsByCSR
1b vcomp140!_vcomp_fork_helper
1c vcomp140!_vcomp::fork_helper_wrapper
1d vcomp140!_vcomp::ParallelRegion::HandlerThreadFunc
1e vcomp140!InvokeThreadTeam
1f vcomp140!vcomp_fork
20 lib_lightgbm!LGBM_DatasetPushRowsByCSR
21 xunit_runner_utility_net452!DomainBoundILStubClass.IL_STUB_PInvoke(SafeDataSetHandle, Int32[], CApiDType, Int32[], Single[], CApiDType, Int64, Int64, Int64, Int64)
22 Microsoft_ML_LightGbm!Microsoft.ML.Trainers.LightGbm.WrappedLightGbmInterface.DatasetPushRowsByCsr
23 Microsoft_ML_LightGbm!Microsoft.ML.Trainers.LightGbm.Dataset.PushRows
24 Microsoft_ML_LightGbm!Microsoft.ML.Trainers.LightGbm.LightGbmTrainerBase<Options,float,Microsoft.ML.Data.BinaryPredictionTransformer<Microsoft.ML.Calibrators.CalibratedModelParametersBase<Microsoft.ML.Trainers.LightGbm.LightGbmBinaryModelParameters,Microsoft.ML.Calibrators.PlattCalibrator>>,Microsoft.ML.Calibrators.CalibratedModelParametersBase<Microsoft.ML.Trainers.LightGbm.LightGbmBinaryModelParameters,Microsoft.ML.Calibrators.PlattCalibrator>>.LoadDataset
25 Microsoft_ML_LightGbm!Microsoft.ML.Trainers.LightGbm.LightGbmTrainerBase<Options,float,Microsoft.ML.Data.BinaryPredictionTransformer<Microsoft.ML.Calibrators.CalibratedModelParametersBase<Microsoft.ML.Trainers.LightGbm.LightGbmBinaryModelParameters,Microsoft.ML.Calibrators.PlattCalibrator>>,Microsoft.ML.Calibrators.CalibratedModelParametersBase<Microsoft.ML.Trainers.LightGbm.LightGbmBinaryModelParameters,Microsoft.ML.Calibrators.PlattCalibrator>>.LoadTrainingData
26 Microsoft_ML_LightGbm!Microsoft.ML.Trainers.LightGbm.LightGbmTrainerBase<Options,float,Microsoft.ML.Data.BinaryPredictionTransformer<Microsoft.ML.Calibrators.CalibratedModelParametersBase<Microsoft.ML.Trainers.LightGbm.LightGbmBinaryModelParameters,Microsoft.ML.Calibrators.PlattCalibrator>>,Microsoft.ML.Calibrators.CalibratedModelParametersBase<Microsoft.ML.Trainers.LightGbm.LightGbmBinaryModelParameters,Microsoft.ML.Calibrators.PlattCalibrator>>.TrainModelCore
27 Microsoft_ML_Data!Microsoft.ML.Trainers.TrainerEstimatorBase<Microsoft.ML.Data.BinaryPredictionTransformer<Microsoft.ML.Calibrators.CalibratedModelParametersBase<Microsoft.ML.Trainers.LightGbm.LightGbmBinaryModelParameters,Microsoft.ML.Calibrators.PlattCalibrator>>,Microsoft.ML.Calibrators.CalibratedModelParametersBase<Microsoft.ML.Trainers.LightGbm.LightGbmBinaryModelParameters,Microsoft.ML.Calibrators.PlattCalibrator>>.TrainTransformer
28 Microsoft_ML_LightGbm!Microsoft.ML.Trainers.LightGbm.LightGbmBinaryTrainer.Fit
29 Microsoft_ML_Tests!Microsoft.ML.Tests.TrainerEstimators.TrainerEstimators.LightGBMBinaryEstimatorUnbalanced
2a clr!CallDescrWorkerInternal
2b clr!CallDescrWorkerWithHandler
2c clr!CallDescrWorkerReflectionWrapper
2d clr!RuntimeMethodHandle::InvokeMethod
2e mscorlib_ni!System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal
2f mscorlib_ni!System.Reflection.RuntimeMethodInfo.Invoke
30 xunit_execution_desktop!Xunit.Sdk.TestInvoker<System.__Canon>.CallTestMethod
31 xunit_execution_desktop!Xunit.Sdk.TestInvoker<Xunit.Sdk.IXunitTestCase>.<>c__DisplayClass48_0.<<InvokeTestMethodAsync>b__1>d.MoveNext
32 mscorlib_ni!System.Runtime.CompilerServices.AsyncTaskMethodBuilder.Start<<<InvokeTestMethodAsync>b__1>d>
33 xunit_execution_desktop!Xunit.Sdk.TestInvoker<Xunit.Sdk.IXunitTestCase>.<>c__DisplayClass48_0.<InvokeTestMethodAsync>b__1
34 xunit_execution_desktop!Xunit.Sdk.ExecutionTimer.<AggregateAsync>d__4.MoveNext
35 mscorlib_ni!System.Runtime.CompilerServices.AsyncTaskMethodBuilder.Start[[System.Security.Cryptography.CryptoStream+<WriteAsyncInternal>d__39, mscorlib]](<WriteAsyncInternal>d__39 ByRef)
36 xunit_execution_desktop!Xunit.Sdk.ExecutionTimer.AggregateAsync
37 xunit_core!Xunit.Sdk.ExceptionAggregator.<RunAsync>d__9.MoveNext
38 mscorlib_ni!System.Runtime.CompilerServices.AsyncTaskMethodBuilder.Start[[System.Security.Cryptography.CryptoStream+<WriteAsyncInternal>d__39, mscorlib]](<WriteAsyncInternal>d__39 ByRef)
39 xunit_core!Xunit.Sdk.ExceptionAggregator.RunAsync
3a xunit_execution_desktop!Xunit.Sdk.TestInvoker<Xunit.Sdk.IXunitTestCase>.<InvokeTestMethodAsync>d__48.MoveNext
3b mscorlib_ni!System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Decimal>.Start<<InvokeTestMethodAsync>d__48>
3c xunit_execution_desktop!Xunit.Sdk.TestInvoker<Xunit.Sdk.IXunitTestCase>.InvokeTestMethodAsync
3d xunit_execution_desktop!Xunit.Sdk.TestInvoker<Xunit.Sdk.IXunitTestCase>.<<RunAsync>b__47_0>d.MoveNext
3e mscorlib_ni!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.Boolean, mscorlib]].Start[[System.Threading.SemaphoreSlim+<WaitUntilCountOrTimeoutAsync>d__31, mscorlib]](<WaitUntilCountOrTimeoutAsync>d__31 ByRef)
3f xunit_execution_desktop!Xunit.Sdk.TestInvoker<Xunit.Sdk.IXunitTestCase>.<RunAsync>b__47_0
40 xunit_core!Xunit.Sdk.ExceptionAggregator.<RunAsync>d__10<System.Decimal>.MoveNext
41 mscorlib_ni!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.Boolean, mscorlib]].Start[[System.Threading.SemaphoreSlim+<WaitUntilCountOrTimeoutAsync>d__31, mscorlib]](<WaitUntilCountOrTimeoutAsync>d__31 ByRef)
42 xunit_core!Xunit.Sdk.ExceptionAggregator.RunAsync<System.Decimal>
43 xunit_execution_desktop!Xunit.Sdk.XunitTestRunner.<InvokeTestAsync>d__4.MoveNext
44 mscorlib_ni!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.Boolean, mscorlib]].Start[[System.Threading.SemaphoreSlim+<WaitUntilCountOrTimeoutAsync>d__31, mscorlib]](<WaitUntilCountOrTimeoutAsync>d__31 ByRef)
45 xunit_execution_desktop!Xunit.Sdk.XunitTestRunner.InvokeTestAsync
46 xunit_core!Xunit.Sdk.ExceptionAggregator+<RunAsync>d__10`1[[System.Decimal, mscorlib]].MoveNext()
47 mscorlib_ni!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.Boolean, mscorlib]].Start[[System.Threading.SemaphoreSlim+<WaitUntilCountOrTimeoutAsync>d__31, mscorlib]](<WaitUntilCountOrTimeoutAsync>d__31 ByRef)
48 xunit_core!Xunit.Sdk.ExceptionAggregator.RunAsync[[System.Decimal, mscorlib]](System.Func`1<System.Threading.Tasks.Task`1<System.Decimal>>)
49 xunit_execution_desktop!Xunit.Sdk.TestRunner<Xunit.Sdk.IXunitTestCase>.<RunAsync>d__43.MoveNext
4a mscorlib_ni!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.Boolean, mscorlib]].Start[[System.Threading.SemaphoreSlim+<WaitUntilCountOrTimeoutAsync>d__31, mscorlib]](<WaitUntilCountOrTimeoutAsync>d__31 ByRef)
4b xunit_execution_desktop!Xunit.Sdk.TestRunner<Xunit.Sdk.IXunitTestCase>.RunAsync
4c xunit_execution_desktop!Xunit.Sdk.TestCaseRunner<Xunit.Sdk.IXunitTestCase>.<RunAsync>d__19.MoveNext
4d mscorlib_ni!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.Boolean, mscorlib]].Start[[System.Threading.SemaphoreSlim+<WaitUntilCountOrTimeoutAsync>d__31, mscorlib]](<WaitUntilCountOrTimeoutAsync>d__31 ByRef)
4e xunit_execution_desktop!Xunit.Sdk.TestCaseRunner<Xunit.Sdk.IXunitTestCase>.RunAsync
4f xunit_execution_desktop!Xunit.Sdk.XunitTestMethodRunner.RunTestCaseAsync
50 xunit_execution_desktop!Xunit.Sdk.TestMethodRunner<Xunit.Sdk.IXunitTestCase>.<RunTestCasesAsync>d__32.MoveNext
51 mscorlib_ni!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.Boolean, mscorlib]].Start[[System.Threading.SemaphoreSlim+<WaitUntilCountOrTimeoutAsync>d__31, mscorlib]](<WaitUntilCountOrTimeoutAsync>d__31 ByRef)
52 xunit_execution_desktop!Xunit.Sdk.TestMethodRunner<Xunit.Sdk.IXunitTestCase>.RunTestCasesAsync
53 xunit_execution_desktop!Xunit.Sdk.TestMethodRunner<Xunit.Sdk.IXunitTestCase>.<RunAsync>d__31.MoveNext
54 mscorlib_ni!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.Boolean, mscorlib]].Start[[System.Threading.SemaphoreSlim+<WaitUntilCountOrTimeoutAsync>d__31, mscorlib]](<WaitUntilCountOrTimeoutAsync>d__31 ByRef)
55 xunit_execution_desktop!Xunit.Sdk.TestMethodRunner<Xunit.Sdk.IXunitTestCase>.RunAsync
56 xunit_execution_desktop!Xunit.Sdk.TestClassRunner<Xunit.Sdk.IXunitTestCase>.<RunTestMethodsAsync>d__38.MoveNext
57 mscorlib_ni!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.Boolean, mscorlib]].Start[[System.Threading.SemaphoreSlim+<WaitUntilCountOrTimeoutAsync>d__31, mscorlib]](<WaitUntilCountOrTimeoutAsync>d__31 ByRef)
58 xunit_execution_desktop!Xunit.Sdk.TestClassRunner<Xunit.Sdk.IXunitTestCase>.RunTestMethodsAsync
59 xunit_execution_desktop!Xunit.Sdk.TestClassRunner<Xunit.Sdk.IXunitTestCase>.<RunAsync>d__37.MoveNext
5a mscorlib_ni!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.Boolean, mscorlib]].Start[[System.Threading.SemaphoreSlim+<WaitUntilCountOrTimeoutAsync>d__31, mscorlib]](<WaitUntilCountOrTimeoutAsync>d__31 ByRef)
5b xunit_execution_desktop!Xunit.Sdk.TestClassRunner<Xunit.Sdk.IXunitTestCase>.RunAsync
5c xunit_execution_desktop!Xunit.Sdk.TestCollectionRunner<Xunit.Sdk.IXunitTestCase>.<RunTestClassesAsync>d__28.MoveNext
5d mscorlib_ni!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.Boolean, mscorlib]].Start[[System.Threading.SemaphoreSlim+<WaitUntilCountOrTimeoutAsync>d__31, mscorlib]](<WaitUntilCountOrTimeoutAsync>d__31 ByRef)
5e xunit_execution_desktop!Xunit.Sdk.TestCollectionRunner<Xunit.Sdk.IXunitTestCase>.RunTestClassesAsync
5f xunit_execution_desktop!Xunit.Sdk.TestCollectionRunner<Xunit.Sdk.IXunitTestCase>.<RunAsync>d__27.MoveNext
60 mscorlib_ni!System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.Boolean, mscorlib]].Start[[System.Threading.SemaphoreSlim+<WaitUntilCountOrTimeoutAsync>d__31, mscorlib]](<WaitUntilCountOrTimeoutAsync>d__31 ByRef)
61 xunit_execution_desktop!Xunit.Sdk.TestCollectionRunner<Xunit.Sdk.IXunitTestCase>.RunAsync
62 xunit_execution_desktop!Xunit.Sdk.XunitTestAssemblyRunner.<>c__DisplayClass14_2.<RunTestCollectionsAsync>b__2
63 mscorlib_ni!System.Threading.Tasks.Task<System.Threading.Tasks.Task<Xunit.Sdk.RunSummary>>.InnerInvoke
64 mscorlib_ni!System.Threading.Tasks.Task.Execute
65 mscorlib_ni!System.Threading.ExecutionContext.RunInternal
66 mscorlib_ni!System.Threading.ExecutionContext.Run
67 mscorlib_ni!System.Threading.Tasks.Task.ExecuteWithThreadLocal
68 mscorlib_ni!System.Threading.Tasks.Task.ExecuteEntry
69 xunit_execution_desktop!Xunit.Sdk.MaxConcurrencySyncContext.RunOnSyncContext
6a mscorlib_ni!System.Threading.ExecutionContext.RunInternal
6b mscorlib_ni!System.Threading.ExecutionContext.Run
6c mscorlib_ni!System.Threading.ExecutionContext.Run
6d xunit_execution_desktop!Xunit.Sdk.ExecutionContextHelper.Run
6e xunit_execution_desktop!Xunit.Sdk.MaxConcurrencySyncContext.WorkerThreadProc
6f mscorlib_ni!System.Threading.ExecutionContext.RunInternal
70 mscorlib_ni!System.Threading.ExecutionContext.Run
71 mscorlib_ni!System.Threading.ExecutionContext.Run
72 mscorlib_ni!System.Threading.ThreadHelper.ThreadStart
73 clr!CallDescrWorkerInternal
74 clr!CallDescrWorkerWithHandler
75 clr!MethodDescCallSite::CallTargetWorker
76 clr!MethodDescCallSite::Call
77 clr!ThreadNative::KickOffThread_Worker
78 clr!ManagedThreadBase_DispatchInner
79 clr!ManagedThreadBase_DispatchMiddle
7a clr!ManagedThreadBase_DispatchOuter
7b clr!ManagedThreadBase_DispatchInCorrectAD
7c clr!Thread::DoADCallBack
7d clr!ManagedThreadBase_DispatchInner
7e clr!ManagedThreadBase_DispatchMiddle
7f clr!ManagedThreadBase_DispatchOuter
80 clr!ManagedThreadBase_FullTransitionWithAD
81 clr!ManagedThreadBase::KickOff
82 clr!ThreadNative::KickOffThread
83 clr!Thread::intermediateThreadProc
84 kernel32!BaseThreadInitThunk
85 ntdll!RtlUserThreadStart

ntdll!ZwWaitForMultipleObjects+0x14:
00007ffc`6caf6974 c3              ret
 # Call Site
00 ntdll!ZwWaitForMultipleObjects
01 KERNELBASE!WaitForMultipleObjectsEx
02 KERNELBASE!WaitForMultipleObjects
03 kernel32!WerpReportFaultInternal
04 kernel32!WerpReportFault
05 KERNELBASE!UnhandledExceptionFilter
06 vcomp140!`_vcomp::fork_helper_wrapper'::`1'::filt$0
07 vcomp140!_C_specific_handler
08 ntdll!RtlpExecuteHandlerForException
09 ntdll!RtlDispatchException
0a ntdll!KiUserExceptionDispatch
0b lib_lightgbm!LightGBM::CreatePredictionEarlyStopInstance
0c lib_lightgbm!LightGBM::ObjectiveFunction::CreateObjectiveFunction
0d lib_lightgbm!LightGBM::ObjectiveFunction::CreateObjectiveFunction
0e lib_lightgbm!LGBM_DatasetPushRowsByCSR
0f vcomp140!_vcomp_fork_helper
10 vcomp140!_vcomp::fork_helper_wrapper
11 vcomp140!_vcomp::ParallelRegion::HandlerThreadFunc
12 vcomp140!_vcomp::PersistentThreadFunc
13 kernel32!BaseThreadInitThunk
14 ntdll!RtlUserThreadStart

ntdll!ZwDelayExecution+0x14:
00007ffc`6caf64a4 c3              ret
 # Call Site
00 ntdll!ZwDelayExecution
01 KERNELBASE!SleepEx
02 kernel32!WerpCheckForParallelExceptions
03 kernel32!WerpReportFault
04 KERNELBASE!UnhandledExceptionFilter
05 vcomp140!`_vcomp::fork_helper_wrapper'::`1'::filt$0
06 vcomp140!_C_specific_handler
07 ntdll!RtlpExecuteHandlerForException
08 ntdll!RtlDispatchException
09 ntdll!KiUserExceptionDispatch
0a lib_lightgbm!LightGBM::CreatePredictionEarlyStopInstance
0b lib_lightgbm!LightGBM::ObjectiveFunction::CreateObjectiveFunction
0c lib_lightgbm!LightGBM::ObjectiveFunction::CreateObjectiveFunction
0d lib_lightgbm!LGBM_DatasetPushRowsByCSR
0e vcomp140!_vcomp_fork_helper
0f vcomp140!_vcomp::fork_helper_wrapper
10 vcomp140!_vcomp::ParallelRegion::HandlerThreadFunc
11 vcomp140!_vcomp::PersistentThreadFunc
12 kernel32!BaseThreadInitThunk
13 ntdll!RtlUserThreadStart

Haven't looked to deeply into the lightgbm code to understand what's happening here - but maybe this will help someone more familiar with what it's doing.

ericstj commented 5 months ago

We have a few customer reports of similar errors too: https://github.com/dotnet/machinelearning/issues/6817 https://github.com/dotnet/machinelearning/issues/3615 https://github.com/dotnet/machinelearning/issues/6426 https://github.com/dotnet/machinelearning/issues/3872 https://github.com/dotnet/machinelearning/issues/3340