Snowflake-Labs / sfguide-intro-to-machine-learning-with-snowflake-ml-for-python

Apache License 2.0
54 stars 111 forks source link

AttributeError: module 'pandas.io.json' has no attribute 'json_normalize' #7

Closed metadaddy closed 11 months ago

metadaddy commented 1 year ago

I followed the quickstart instructions in 3_snowpark_ml_model_training_deployment.ipynb but encountered errors when running the cells for mape and relplot (see attached output files, xgb_output_1.txt and xgb_output_2.txt).

Both errors seem to be caused by pandas.io.json.json_normalize not being found in the UDF:

AttributeError: module 'pandas.io.json' has no attribute 'json_normalize'
 in function SNOWML_BATCH_INFERENCE_E5AAA8B3_191D_44C4_9823_3E205F9A81CD_PREDICT with handler udf_py_579079450.compute

The attached error files are related to XGBRegressor, but I encountered similar errors with GridSearchCV.

The root cause appears to be that the UDFs are missing the pandas package. Looking in the generated code for the Snowpark ML objects, the constructors list numpy, xgboost and cloudpickle as dependencies, but not pandas. For example, from xgb_regressor.py:

deps: Set[str] = set([f'numpy=={np.__version__}', f'xgboost=={xgboost.__version__}', f'cloudpickle=={cp.__version__}'])

self._deps = list(deps)

Patching pandas dependencies into the regressor and grid search after they are created seems to fix the problem:

# Immediately after regressor is created
regressor._deps.append('pandas==1.5.3')
regressor._deps
# Immediately after grid search is created
grid_search._deps.append('pandas==1.5.3')
grid_search._deps

Looking at the Snowpark ML repo, version 1.0.7 appears to address this issue:

Bug Fixes

  • Model Development & Model Registry: Fix an error related to pandas.io.json.json_normalize.

The most recent version of Snowpark ML (snowpark-ml-python) supported by the Anaconda Snowflake Channel is 1.0.5, so I used pip to upgrade to the current version, 1.0.8:

pip install pip install snowflake-ml-python==1.0.8

Now, I am able to complete the notebook successfully.

I'm guessing that the Snowflake environment might have changed since this quickstart was written, causing the error. Despite quite extensive investigation, I wasn't able to complete the third notebook without either patching the Snowflake ML objects' dependencies or upgrading Snowpark ML.

sfc-gh-sidas commented 11 months ago

@metadaddy Yes you're right, this was resolved in ver 1.0.7, which is available in Snowflake now.

metadaddy commented 11 months ago

Cool - I see the commit that fixes this: 9ed0260d5e8abeefbda6530f08a2e0b1d920bbc8

Thanks! 👍🏻