dotnet / machinelearning-modelbuilder

Simple UI tool to build custom machine learning models.
Creative Commons Attribution 4.0 International
265 stars 56 forks source link

After one round train,resnet_v2_50_299 will be auto deleted by something #1174

Closed Zhoujiangcat closed 3 years ago

Zhoujiangcat commented 3 years ago

System Information (please complete the following information):

Describe the bug After one round train,resnet_v2_50_299 will be auto deleted by something,and trainer show resnet_v2_50_299 is missing,then start download

To Reproduce Steps to reproduce the behavior: [Source=ImageClassificationTrainer; ImageClassificationTrainer, Kind=Trace] Phase: Training, Dataset used: Train, Batch Processed Count: 92, Epoch: 25, Accuracy: 0.9923914, Cross-Entropy: 0.03539219, Learning Rate: 0.004473651 [Source=ImageClassificationTrainer; ImageClassificationTrainer, Kind=Trace] Phase: Training, Dataset used: Validation, Batch Processed Count: 11, Epoch: 25, Accuracy: 0.8636363, Cross-Entropy: 0.7466159 [Source=ImageClassificationTrainer; ImageClassificationTrainer, Kind=Trace] Phase: Training, Dataset used: Train, Batch Processed Count: 92, Epoch: 26, Accuracy: 0.9934784, Cross-Entropy: 0.03474673, Learning Rate: 0.004473651 [Source=ImageClassificationTrainer; ImageClassificationTrainer, Kind=Trace] Phase: Training, Dataset used: Validation, Batch Processed Count: 11, Epoch: 26, Accuracy: 0.8636363, Cross-Entropy: 0.7511082 [Source=ImageClassificationTrainer; ImageClassificationTrainer, Kind=Trace] Phase: Training, Dataset used: Train, Batch Processed Count: 92, Epoch: 27, Accuracy: 0.9934784, Cross-Entropy: 0.03417851, Learning Rate: 0.004205232 [Source=ImageClassificationTrainer; ImageClassificationTrainer, Kind=Trace] Phase: Training, Dataset used: Validation, Batch Processed Count: 11, Epoch: 27, Accuracy: 0.8636363, Cross-Entropy: 0.7538242 [Source=AutoML, Kind=Error] Pipeline crashed: xf=ValueToKeyMapping{ col=Label:Label} xf=RawByteImageLoading{ col=ImageSource_featurized:ImageSource imageFolder=} xf=ColumnCopying{ col=Features:ImageSource_featurized} tr=ImageClassification{} xf=KeyToValueMapping{ col=PredictedLabel:PredictedLabel} cache=- . Exception: System.IO.FileNotFoundException: 未能找到文件“C:\Users***\AppData\Local\Temp\MLNET\resnet_v2_50_299.meta”。 文件名:“C:\Users***\AppData\Local\Temp\MLNET\resnet_v2_50_299.meta” 在 System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath) 在 System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost) 在 System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost) 在 System.IO.File.InternalReadAllBytes(String path, Boolean checkHost) 在 Tensorflow.meta_graph.read_meta_graph_file(String filename) 在 Tensorflow.saver._import_meta_graph_with_return_elements(String meta_graph_or_file, Boolean clear_devices, String import_scope, String[] return_elements) 在 Microsoft.ML.TensorFlow.TensorFlowUtils.LoadMetaGraph(String path) 在 Microsoft.ML.Vision.ImageClassificationTrainer.BuildEvaluationSession(Int32 classCount) 在 Microsoft.ML.Vision.ImageClassificationTrainer.UpdateTransferLearningModelOnDisk(Int32 classCount) 在 Microsoft.ML.Vision.ImageClassificationTrainer.TrainAndEvaluateClassificationLayer(String trainBottleneckFilePath, String validationSetBottleneckFilePath) 在 Microsoft.ML.Vision.ImageClassificationTrainer.TrainModelCore(TrainContext trainContext) 在 Microsoft.ML.Trainers.TrainerEstimatorBase2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor) 在 Microsoft.ML.Data.EstimatorChain1.Fit(IDataView input) 在 Microsoft.ML.Data.EstimatorChain1.Fit(IDataView input) 在 Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String groupId, String labelColumn, IMetricsAgent1 metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, IChannel logger) [Source=ImageClassificationTrainer; ImageClassificationTrainer, Kind=Trace] Channel started [Source=ImageClassificationTrainer; Ensuring meta files are present., Kind=Trace] Channel started [Source=ImageClassificationTrainer; Ensuring meta files are present., Kind=Info] Downloading resnet_v2_50_299.meta from https://aka.ms/mlnet-resources/meta/resnet_v2_50_299.meta to C:\Users***\AppData\Local\Temp\MLNET\resnet_v2_50_299.meta

Expected behavior Another round train start

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Add any other context about the problem here.

JakeRadMSFT commented 3 years ago

@michaelgsharp @LittleLittleCloud thoughts?

michaelgsharp commented 3 years ago

I dont have any thoughts yet, I'll look into it though. See if I can figure out what is deleting the model. @JakeRadMSFT, does model builder do anything special in-between training rounds?

JakeRadMSFT commented 3 years ago

We don't but I wonder if this is a setting for Temp? I wasn't able to reproduce but I'll try with latest version.

@Zhoujiangcat does this still reproduce for you?

f-quintero commented 3 years ago

We don't but I wonder if this is a setting for Temp? I wasn't able to reproduce but I'll try with latest version.

@Zhoujiangcat does this still reproduce for you?

To replay it be sure you do not have already downloaded resnet_v2_50_299.meta. If you have it, rename it o delete it before try.

vzhuqin commented 3 years ago

not repro issue on main branch: https://privategallery.blob.core.windows.net/gallery/refs/heads/main/atom.xml

Azure Steps: 1). Download dataset from: https://testpass.blob.core.windows.net/test-pass-data/weather.zip 2). Create new C# console app with .Net 5.0; 3). Add model builder by right click on the project; 4). Click "Image classification" on Scenario page; 5). Select Azure and set up workspace on Environment page; 6). Click "..." to select a folder on Data page; 7). Click "Start training" on Train page, and wait Training complete; image.png Local Steps: 1). Download dataset from: https://testpass.blob.core.windows.net/test-pass-data/weather.zip 2). Create new C# console app with .Net 5.0; 3). Add model builder by right click on the project; 4). Click "Image classification" on Scenario page; 5). Select Local (CPU) on Environment page; 6). Click "..." to select a folder on Data page; 7). Click "Start training" on Train page, and wait Training complete; image.png