dotnet / machinelearning-modelbuilder

Simple UI tool to build custom machine learning models.
Creative Commons Attribution 4.0 International
264 stars 56 forks source link

Error "ONNX Runtime only *guarantees* support for models stamped with official released onnx opset versions" when training Model in Azure #2978

Open marxxxx opened 1 week ago

marxxxx commented 1 week ago

System Information (please complete the following information):

Describe the bug Starting today i get an exception when i train an image classification model on Azure. I follow the usual Model Builder Wizard, Error initializing model :Microsoft.ML.OnnxRuntime.OnnxRuntimeException: [ErrorCode:Fail] Load model from D:\Dev\SWA\Solution\Project\xxx\xxx.onnx failed:D:\a\_work\1\s\onnxruntime\core/graph/model_load_utils.h:57 onnxruntime::model_load_utils::ValidateOpsetForDomain ONNX Runtime only *guarantees* support for models stamped with official released onnx opset versions. Opset 17 is under development and support for this is limited. The operator schemas and or other functionality may change before next ONNX release and in this case ONNX Runtime will not guarantee backward compatibility. Current official support for domain ai.onnx is till opset 15.

at Microsoft.ML.OnnxRuntime.NativeApiStatus.VerifySuccess(IntPtr nativeStatus) at Microsoft.ML.OnnxRuntime.InferenceSession.Init(String modelPath, SessionOptions options, PrePackedWeightsContainer prepackedWeightsContainer) at Microsoft.ML.OnnxRuntime.InferenceSession..ctor(String modelPath, SessionOptions options) at Microsoft.ML.Transforms.Onnx.OnnxModel..ctor(String modelFile, Nullable1 gpuDeviceId, Boolean fallbackToCpu, Boolean ownModelFile, IDictionary2 shapeDictionary, Int32 recursionLimit, Nullable1 interOpNumThreads, Nullable1 intraOpNumThreads) at Microsoft.ML.Transforms.Onnx.OnnxTransformer..ctor(IHostEnvironment env, Options options, Byte[] modelBytes)

at Microsoft.ML.Transforms.Onnx.OnnxTransformer..ctor(IHostEnvironment env, Options options, Byte[] modelBytes) at Microsoft.ML.Transforms.Onnx.OnnxTransformer..ctor(IHostEnvironment env, String[] outputColumnNames, String[] inputColumnNames, String modelFile, Nullable1 gpuDeviceId, Boolean fallbackToCpu, IDictionary2 shapeDictionary, Int32 recursionLimit, Nullable1 interOpNumThreads, Nullable1 intraOpNumThreads) at Microsoft.ML.Transforms.Onnx.OnnxScoringEstimator..ctor(IHostEnvironment env, String modelFile, Nullable1 gpuDeviceId, Boolean fallbackToCpu, IDictionary2 shapeDictionary, Int32 recursionLimit) at Microsoft.ML.OnnxCatalog.ApplyOnnxModel(TransformsCatalog catalog, String modelFile, Nullable`1 gpuDeviceId, Boolean fallbackToCpu) at AzureML.AutoMLRunnerImages.GetBestModelAndPipelineImageClassification(MLContext mlContext, String modelFile) in //src/Microsoft.ML.ModelBuilder.AutoMLService/RemoteAutoML/AutoMLRunnerImages.cs:line 276 at AzureML.AutoMLRunnerImages.RunAutoMLAsync(MLContext mLContext, CancellationToken cancellationToken) in //src/Microsoft.ML.ModelBuilder.AutoMLService/RemoteAutoML/AutoMLRunnerImages.cs:line 202 at Microsoft.ML.ModelBuilder.AutoMLService.Experiments.AzureImageClassificationExperiment.ExecuteAsync(IDataView trainData, IDataView validateData, CancellationToken ct) in //src/Microsoft.ML.ModelBuilder.AutoMLService/Experiments/AzureImageClassificationExperiment.cs:line 60 at Microsoft.ML.ModelBuilder.AutoMLEngine.StartTrainingAsync(ITrainingConfiguration config, PathConfiguration pathConfig, CancellationToken userCancellationToken) in //src/Microsoft.ML.ModelBuilder.AutoMLService/AutoMLEngineService/AutoMLEngine.cs:line 178

To Reproduce Steps to reproduce the behavior:

Expected behavior

Additional context Azure Machine Learning Run Id: https://ml.azure.com/experiments/id/4635c1c0-3935-446b-a646-0036551de5cf/runs/zeihheakczitmky?wsid=/subscriptions/c7c553dd-b8f1-451b-96c6-b5111f449abb/resourceGroups/wireterminal-ml-dev/providers/Microsoft.MachineLearningServices/workspaces/wireterminal-ml-studio-dev

LittleLittleCloud commented 2 days ago

Looks like a warning that we can disable as a temporary workaround? The root cause might be azure ml service upgrade their onnx version? Need further investigation.

@michaelgsharp Is there an option disable that check on mlnet onnxruntime transformer side?