Open ericstj opened 5 months ago
@michaelgsharp made a good observation offline - we're seeing memory usage go up quite a bit as the tests progress.
Finished test: Microsoft.ML.TorchSharp.Tests.TextClassificationTests.TestSentenceSimilarity with memory usage 2,077,020,160.00 and max memory usage 2,370,473,984.00
That's using 2GB memory after the previous test completed.
Wow - the memory usage of this test is very high. Here's what I see from a local passing run on Windows.
Discovering: Microsoft.ML.TorchSharp.Tests (method display = ClassAndMethod, method display options = None)
Discovered: Microsoft.ML.TorchSharp.Tests (found 12 test cases)
Starting: Microsoft.ML.TorchSharp.Tests (parallel test collections = on [20 threads], stop on fail = off)
Starting test: Microsoft.ML.TorchSharp.Tests.NerTests.TestSimpleNer
Finished test: Microsoft.ML.TorchSharp.Tests.NerTests.TestSimpleNer with memory usage 751,607,808.00 and max memory usage 751,607,808.00
Starting test: Microsoft.ML.TorchSharp.Tests.NerTests.TestSimpleNerOptions
Microsoft.ML.TorchSharp.Tests.NerTests.TestNERLargeFileGpu [SKIP]
Needs to be on a comp with GPU or will take a LONG time.
Finished test: Microsoft.ML.TorchSharp.Tests.NerTests.TestSimpleNerOptions with memory usage 895,778,816.00 and max memory usage 895,778,816.00
Starting test: Microsoft.ML.TorchSharp.Tests.ObjectDetectionTests.SimpleObjDetectionTest
total : 171, filtered: 0, filter ratio: 0.00%
Finished test: Microsoft.ML.TorchSharp.Tests.ObjectDetectionTests.SimpleObjDetectionTest with memory usage 1,142,628,352.00 and max memory usage 1,155,977,216.00
Starting test: Microsoft.ML.TorchSharp.Tests.TextClassificationTests.TestSingleSentence3Classes
Finished test: Microsoft.ML.TorchSharp.Tests.TextClassificationTests.TestSingleSentence3Classes with memory usage 1,111,171,072.00 and max memory usage 1,155,977,216.00
Starting test: Microsoft.ML.TorchSharp.Tests.TextClassificationTests.TestDoubleSentence2Classes
Finished test: Microsoft.ML.TorchSharp.Tests.TextClassificationTests.TestDoubleSentence2Classes with memory usage 1,352,704,000.00 and max memory usage 1,352,818,688.00
Starting test: Microsoft.ML.TorchSharp.Tests.TextClassificationTests.TestSingleSentence2Classes
Finished test: Microsoft.ML.TorchSharp.Tests.TextClassificationTests.TestSingleSentence2Classes with memory usage 1,365,450,752.00 and max memory usage 1,366,872,064.00
Starting test: Microsoft.ML.TorchSharp.Tests.TextClassificationTests.TestSentenceSimilarity
Finished test: Microsoft.ML.TorchSharp.Tests.TextClassificationTests.TestSentenceSimilarity with memory usage 1,362,817,024.00 and max memory usage 1,368,600,576.00
Microsoft.ML.TorchSharp.Tests.TextClassificationTests.TestSentenceSimilarityLargeFileGpu [SKIP]
Needs to be on a comp with GPU or will take a LONG time.
Microsoft.ML.TorchSharp.Tests.TextClassificationTests.TestTextClassificationWithBigDataOnGpu [SKIP]
Condition(s) not met: "EnableRunningGpuTest"
Starting test: Microsoft.ML.TorchSharp.Tests.QATests.TestSimpleQA
Finished test: Microsoft.ML.TorchSharp.Tests.QATests.TestSimpleQA with memory usage 4,675,801,088.00 and max memory usage 5,540,958,208.00
Microsoft.ML.TorchSharp.Tests.QATests.TestQALargeFileGpu [SKIP]
Needs to be on a comp with GPU or will take a LONG time.
Finished: Microsoft.ML.TorchSharp.Tests
So we may have some leak (this still shows growth) but we also are using a ton of memory when running this test.
Build Information
Build: https://dev.azure.com/dnceng-public/public/_build/results?buildId=530980&view=results Build error leg or test failing: Microsoft.ML.TorchSharp.Tests Work Item Pull Request https://github.com/dotnet/machinelearning/pull/6976
Error Message
Fill the error message using step by step known issues guidance.
System Information (please complete the following information):
Describe the bug This test is failing in CI somewhat regularly. The error pattern looks like the following:
Here are a few instances: https://helixre107v0xd1eu3ibi6ka.blob.core.windows.net/dotnet-machinelearning-refs-pull-6974-merge-f61a125156aa4af1bd/Microsoft.ML.TorchSharp.Tests/1/console.83a6fa6c.log?helixlogtype=result https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-machinelearning-refs-pull-6976-merge-0a13c2cd41724c3483/Microsoft.ML.TorchSharp.Tests/1/console.ff57f777.log?helixlogtype=result
I can't currently capture this failure in a known issue because there is no unique line logged. I've seen this failure numerous times - always when
TestSimpleQA
is running.Report
Summary
Known issue validation
Build: :mag_right: Result validation: :warning: Build internal information not found. This may happen if your build is too old. Please use a build that is no older than two weeks. If the problem persists, contact .NET Engineering Services Team and share this issue. Validation performed at: 2/14/2024 10:25:46 PM UTC