dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.04k stars 1.89k forks source link

File already exists error when loading LightGbmBinaryTrainer from MemoryStream #5210

Closed dasokolo closed 4 years ago

dasokolo commented 4 years ago

System information

os version Windows (not sure) .net version 3.1.202 mlnet version 1.3.1

Issue

I am loading LightGbmBinaryTrainer from memory stream

            ITransformer model = null;
            using (MemoryStream ms = new MemoryStream(kv.Value))
            {
                model = _mLContext.Model.Load(ms, out _inputSchema);
            }

Note, this code is executed in several threads in parallel over the same model.

And I occasionally get the following exception:

System.IO.IOException: The file 'D:\SvcFab_App\AIBuilder.Platform.Host_App157\temp\TLC_1CBA6C2E\0' already exists.

This happens occasionally and I can't reliably reproduce this 100%.

2) But if file operations cannot be avoided, I expect to have no naming conflicts.

System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.IO.IOException: The file 'D:\SvcFab_App\AIBuilder.Platform.Host_App157\temp\TLC_1CBA6C2E\0' already exists. at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath) at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost) at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share) at System.IO.Compression.ZipFileExtensions.ExtractToFile(ZipArchiveEntry source, String destinationFileName, Boolean overwrite) at Microsoft.ML.RepositoryReader.OpenEntryOrNull(String dir, String name) at Microsoft.ML.ModelOperationsCatalog.Load(Stream stream, DataViewSchema& inputSchema)

frank-dong-ms-zz commented 4 years ago

@dasokolo could you please provide a repro project and necessary dataset so we can investigate further? BTW, are you working on latest version of ML.NET (1.5.0)?

dasokolo commented 4 years ago

Unfortunately I don’t have a reproducer. This issue happened in a production environment and the only thing I have is logs. I tried to reproduce this issue in a controlled environment, but have not succeeded so far.

I am using ML.NET 1.3.1

From: frank-dong-ms notifications@github.com Sent: Tuesday, June 9, 2020 9:06 PM To: dotnet/machinelearning machinelearning@noreply.github.com Cc: Daniil Sokolov Daniil.Sokolov@microsoft.com; Mention mention@noreply.github.com Subject: Re: [dotnet/machinelearning] File already exists error when loading LightGbmBinaryTrainer from MemoryStream (#5210)

@dasokolohttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdasokolo&data=02%7C01%7Cdaniil.sokolov%40microsoft.com%7Cb2858b5be1754e7839c508d80cf3a063%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637273587792195629&sdata=ytEwwMguEGh%2B9cmqsNWaRqYXawcHxnVnle5XaeTBfaM%3D&reserved=0 could you please provide a repro project and necessary dataset so we can investigate further? BTW, are you working on latest version of ML.NET (1.5.0)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdotnet%2Fmachinelearning%2Fissues%2F5210%23issuecomment-641707257&data=02%7C01%7Cdaniil.sokolov%40microsoft.com%7Cb2858b5be1754e7839c508d80cf3a063%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637273587792195629&sdata=f5RF7TZCSfOnaimGdf7CKhWLZGTE2MbmL6GLUClQvrA%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FANQ47724XFKZJUY7E5MHUZTRV4BDTANCNFSM4NTCDWZQ&data=02%7C01%7Cdaniil.sokolov%40microsoft.com%7Cb2858b5be1754e7839c508d80cf3a063%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637273587792205588&sdata=cdhT3PDXnirSfurtalFMlZXWuaRKRcbWjy%2Bnd2z6%2FPI%3D&reserved=0.

dasokolo commented 4 years ago

Is there a way to make File IO functionality configurable and only use memory?

frank-dong-ms-zz commented 4 years ago

@dasokolo I can't repro the issue with provided code piece as well. Regarding the question, I believe the answer is no but let me check code and get back to you later, thanks.

frank-dong-ms-zz commented 4 years ago

@dasokolo I have checked the code, currently there is no option that can user can set to avoid file operation when load model. I already open a PR to fix the issue you met.

frank-dong-ms-zz commented 4 years ago

This issue should already fixed at below PR: https://github.com/dotnet/machinelearning/pull/4645

please upgrade to latest version of ml.net.

frank-dong-ms-zz commented 4 years ago

Close this issue now as this already fixed, feel free to reopen if necessary, thanks