Closed wes-baldwin closed 3 years ago
Hey @wes-baldwin!
Sorry for the delayed response!
Would you be up for trying our private-preview build to see if it resolves your issue? If so, you can sign up here: https://aka.ms/mb-private-preview
If not - I think the "old" or released version might just need updated versions of Cuda and CuDNN.
We have new instructions here: https://github.com/dotnet/machinelearning/blob/main/docs/api-reference/tensorflow-usage.md
You'll have to uninstall Cuda 10.0 and replace it with Cuda 10.1
Sorry again for delayed response. Let me know if this resolves the issue!
@wes-baldwin I just finished running this and I don't think the above will resolve the issue for this dataset.
We're looking into why it's canceling early
Thanks for the response. I’ve since moved on to other projects. I don’t require a resolution. I really posted this more for the benefit of others.
System Information (please complete the following information):
Describe the bug
2 metricsAggregator) at Microsoft.ML.Vision.ImageClassificationTrainer.TrainAndEvaluateClassificationLayer(String trainBottleneckFilePath, String validationSetBottleneckFilePath) at Microsoft.ML.Vision.ImageClassificationTrainer.TrainModelCore(TrainContext trainContext) at Microsoft.ML.Trainers.TrainerEstimatorBase
2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor) at Microsoft.ML.Data.EstimatorChain1.Fit(IDataView input) at Microsoft.ML.Data.EstimatorChain
1.Fit(IDataView input) at Microsoft.ML.AutoML.RunnerUtil.TrainAndScorePipeline[TMetrics](MLContext context, SuggestedPipeline pipeline, IDataView trainData, IDataView validData, String groupId, String labelColumn, IMetricsAgent`1 metricsAgent, ITransformer preprocessorTransform, FileInfo modelFileInfo, DataViewSchema modelInputSchema, IChannel logger) [Source=ImageClassificationTrainer; ImageClassificationTrainer, Kind=Trace] Channel started [Source=ImageClassificationTrainer; Ensuring meta files are present., Kind=Trace] Channel started [Source=ImageClassificationTrainer; Ensuring meta files are present., Kind=Trace] Channel disposed [Source=ImageClassificationTrainer; Ensuring meta files are present., Kind=Trace] Channel finished. Elapsed 00:00:00.0001597. Training cancelledTo Reproduce Steps to reproduce the behavior:
Expected behavior I expected the training to complete
Screenshots
Additional context I installed all software as stated on https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/install-gpu-model-builder. My machine info: