Both errors seem to be caused by pandas.io.json.json_normalize not being found in the UDF:
AttributeError: module 'pandas.io.json' has no attribute 'json_normalize'
in function SNOWML_BATCH_INFERENCE_E5AAA8B3_191D_44C4_9823_3E205F9A81CD_PREDICT with handler udf_py_579079450.compute
The attached error files are related to XGBRegressor, but I encountered similar errors with GridSearchCV.
The root cause appears to be that the UDFs are missing the pandas package. Looking in the generated code for the Snowpark ML objects, the constructors list numpy, xgboost and cloudpickle as dependencies, but not pandas. For example, from xgb_regressor.py:
Model Development & Model Registry: Fix an error related to pandas.io.json.json_normalize.
The most recent version of Snowpark ML (snowpark-ml-python) supported by the Anaconda Snowflake Channel is 1.0.5, so I used pip to upgrade to the current version, 1.0.8:
Now, I am able to complete the notebook successfully.
I'm guessing that the Snowflake environment might have changed since this quickstart was written, causing the error. Despite quite extensive investigation, I wasn't able to complete the third notebook without either patching the Snowflake ML objects' dependencies or upgrading Snowpark ML.
I followed the quickstart instructions in 3_snowpark_ml_model_training_deployment.ipynb but encountered errors when running the cells for mape and relplot (see attached output files, xgb_output_1.txt and xgb_output_2.txt).
Both errors seem to be caused by
pandas.io.json.json_normalize
not being found in the UDF:The attached error files are related to
XGBRegressor
, but I encountered similar errors withGridSearchCV
.The root cause appears to be that the UDFs are missing the
pandas
package. Looking in the generated code for the Snowpark ML objects, the constructors listnumpy
,xgboost
andcloudpickle
as dependencies, but notpandas
. For example, fromxgb_regressor.py
:Patching pandas dependencies into the regressor and grid search after they are created seems to fix the problem:
Looking at the Snowpark ML repo, version 1.0.7 appears to address this issue:
The most recent version of Snowpark ML (snowpark-ml-python) supported by the Anaconda Snowflake Channel is 1.0.5, so I used pip to upgrade to the current version, 1.0.8:
Now, I am able to complete the notebook successfully.
I'm guessing that the Snowflake environment might have changed since this quickstart was written, causing the error. Despite quite extensive investigation, I wasn't able to complete the third notebook without either patching the Snowflake ML objects' dependencies or upgrading Snowpark ML.