dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
8.92k stars 1.86k forks source link

ResourceManagerUtils.DownloadResource aquires mutex on one thread and releases from another #6980

Open ericstj opened 5 months ago

ericstj commented 5 months ago

System Information (please complete the following information):

Describe the bug I noticed an exception during local testing. The error from the test was DownloadFailed with exception One or more errors occurred. (Object synchronization method was called from an unsynchronized block of code.) This is happening because we are using a Mutex within an async method.

Mutex's have thread affinity. They must be released from the same thread that they were acquired from: https://learn.microsoft.com/en-us/dotnet/api/system.threading.mutex.releasemutex?view=net-8.0

However an async method that uses ConfigureAwait(false) will not necessarily resume on the same thread.

To Reproduce Steps to reproduce the behavior:

  1. Delete local copies of ML.NET resources (eg from %TEMP%\mlnet)
  2. Run dotnet test on Microsoft.ML.TorchSharp.Tests
  3. Observe failure, if not observed then repeat from step 1.

Expected behavior Tests run to completion.

Screenshots, Code, Sample Projects

System.InvalidOperationException : Error downloading resource from 'https://aka.ms/mlnet-resources/models/pretrained_Roberta_encoder.tsm': DownloadFailed with exception One or more errors occurred. (Object synchronization method was called from an unsynchronized block of code.)\\nDownloadFailed with exception One or more errors occurred. (A task was canceled.)\\nDownloadFailed with exception One or more errors occurred. (The wait completed due to an abandoned mutex.)\\nDownloadFailed with exception One or more errors occurred. (A task was canceled.)\\nDownloadFailed with exception One or more errors occurred. (The wait completed due to an abandoned mutex.)\\n\nmodel file could not be downloaded!
   at Microsoft.ML.TorchSharp.Roberta.QATrainer.Trainer.GetModelPath() in C:\src\dotnet\machinelearning\src\Microsoft.ML.TorchSharp\Roberta\QATrainer.cs:line 260