Snowflake-Labs / sfguide-getting-started-with-snowpark-for-machine-learning-on-azureml

Apache License 2.0
2 stars 5 forks source link

Missed Column/Table #7

Open sfc-gh-imehaddi opened 4 months ago

sfc-gh-imehaddi commented 4 months ago

In this Notebook 1_mfr_mlflow.ipynb :

  1. No Table named HUMIDITY_UDI in Snowflake, SO i changed it to HUMIDITY
  2. There is no "UDI" ID Column fo this table : HUMIDITY

Remark : the usage of Snowpark ML API is limited

sfc-gh-imehaddi commented 4 months ago

Correction :

access data from snowflake

import pandas as pd from snowflake.snowpark.session import Session from snowflake.snowpark.functions import from snowflake.snowpark.types import

connection_parameters = { "account": "", "user": "", "host": "", # e.g. "sn00111.snowflakecomputing.com", "password": "", "role": "ACCOUNTADMIN", "warehouse": "SMALL_WH", "database":"MFR", "schema":"PUBLIC" } session = Session.builder.configs(connection_parameters).create()

maintenance_df = session.table('HOL_DB.PUBLIC.maintenance') humidity_df = session.table('HOL_DB.PUBLIC.Humidity')

hum_udi_df = session.table('HOL_DB.PUBLIC.HUMIDITY') # HUMIDITY_UDI

city_df = session.table('HOL_DB.PUBLIC.CITY_UDF') #session.table('HOL_DB.PUBLIC.HUMIDITY')

join together the dataframes and prepare training dataset

maintenance_city = maintenance_df.join(city_df, ["UDI"]) maintenance_hum = maintenance_city.join(humidity_df, (maintenance_city.col("CITY") == humidity_df.col("CITY_NAME"))).select(col("TYPE"), col("AIR_TEMPERATURE_K"), col("PROCESS_TEMPERATURE"), col("ROTATIONAL_SPEED_RPM"), col("TORQUE_NM"), col("TOOL_WEAR_MIN"), col("HUMIDITY_RELATIVE_AVG"), col("MACHINE_FAILURE"))