Azure / MachineLearningNotebooks

Python notebooks with ML and deep learning examples with Azure Machine Learning Python SDK | Microsoft
https://docs.microsoft.com/azure/machine-learning/service/
MIT License
4.07k stars 2.52k forks source link

Unable to load model in score script - calls to init() are failing #1877

Closed abhijelly closed 1 year ago

abhijelly commented 1 year ago

I have modified the demand forecasting template for my use case. I'm unable to load my model in score script. The model I'm using is a CatBoost model which has its owns load_module() method. I've tried the following approaches which are failing -

Error '2. usage: main.py [-h] [--model_path MODEL_PATH] main.py: error: unrecognized arguments: --client_sdk_version 1.47.0 --scoring_module_name [REDACTED]-forecast.py --mini_batch_size 1048576 --error_threshold -1 --output_action append_row --logging_level DEBUG --run_invocation_timeout 60 --run_max_try 3 --create_snapshot_at_runtime True --allowed_failed_count 0 --output /mnt/azureml/cr/j/e676d0761e0048e9b6c39bda36794d61/cap/data-capability/wd/parallelRunOutput --input_ds_0 raw_data --aml_core_version 1.48.0 --dataprep_version 4.8.4 --bf89b5b0_523f_4782_aedc_61bd625ee81a {"working_dir": "/mnt/azureml/cr/j/e676d0761e0048e9b6c39bda36794d61/exe/wd/0ae00b1b-8c86-4f38-ba77-b0538d66ee0b", "snapshot_dir": "/mnt/azureml/cr/j/e676d0761e0048e9b6c39bda36794d61/exe/wd", "port": 42085, "input_format": "TabularDataset", "agent_name": "process000", "inputs": ["raw_data"], "gpu_index": -1, "mini_batch_size": 1048576}.' occurred 2 times.

  • Approach 3: Using the AZUREML_MODEL_DIR environment variable
    model_path = os.path.join(os.getenv("AZUREML_MODEL_DIR"), "metals_1_month_model/model.cb")
    model = CatBoostRegressor.load_model(model_path)

    File "/mnt/azureml/cr/j/f7ea2e8b202a4c6ca67b2c7bd8777fda/exe/wd/[REDACTED]-forecast.py", line 17, in init model_path = os.path.join(os.getenv("AZUREML_MODEL_DIR"), "metals_1_month_model/model.cb") File "/opt/miniconda/lib/python3.8/posixpath.py", line 76, in join a = os.fspath(a) TypeError: expected str, bytes or os.PathLike object, not NoneType

AZUREML_MODEL_DIR is None for some reaason

Entry script error. All tries to load the entry script or calling init() failed. Please check logs/user/error/ and logs/sys/error/ to see if some errors have occurred.No mini batch has been completed. Consider a succeeded mini batch or failed mini batch reached the max tries as completed. The init() function in the entry script had raised exception for 39 times. Please check logs at logs/user/error/ for details. Error 'catboost/libs/model/model_import_interface.h:19: Model file doesn't exist: model.cb' occurred 78 times.

Any thoughts what I might be doing wrong in my approaches? It would be highly appreciated! Thank you!

Hung20736 commented 1 year ago

The approach 1, you don't have a config.json file which specifies which subs, resource_group and workspace you are using. The config.json should be something like this: { "subscription_id": "", "resource_group": "", "workspace_name": "" }

You can add the config.json file to the source_directory which is configured in ParallelRunConfig.

Or you can hard-code doing this: ws = Workspace(subscription_id=, resource_group=, workspace_name) model_path = Model.get_model_path("forecast_model/model.cb", version=1, _workspace=ws )

abhijelly commented 1 year ago

thank you for answering!

hardcoding approach worked for me because during batch processing, the parallel worker was not recognizing the workspace config file. One correction to your answer is, Model.get_model_path() should only be given the model folder name not the complete path to the model file. After getting the model folder path, append to the "model.cb" so that model can be loaded