Teradata / jupyter-demos

14 stars 16 forks source link

Product Engineering Review : Green Manufacturing #538

Open shilpa-nalkande opened 6 months ago

shilpa-nalkande commented 5 months ago

Reviewer 1 comments: • almost all places sql query is used. • python xgboost, fit, transform used • In section 8 and 9 plotly used

Reviewer 1 Suggestions • Teradataml supports all functions which are used in SQL query --- Removed SQLs • some python function used teradataml also supports XGBoost, VAL.fit() and VAL.transform() method --- Will replace once available • In section8 and 9 we can use teradataml plot function. ---Used teradataml plots • In section 7.2 and 7.3 we have used all teradataml functions good to see.

Reviewer 2 comments: • Ignored lot of warnings • In section 4, for plotting y wrt to independent variables X0 to X8, data is pulled to client using to_pandas() and plotted 9 different types of plots using seaborn. • In section 5, xgboost DMatrix and plot_importance is being used. LabelEncoder of sklearn is being used. o teradataml's OpensourceML exposes sklearn (upcoming feature). • OrdinalEncoding of SQLE is used but using queries. • teradataml.from_query() is used to create DataFrames in section 6. • Used copy_to_sql() to persist "final_data" table. • Used TD_TrainTestSplit SQLE query for splitting train and test data and created teradataml DataFrames. • Use of TD_DecisionForest, TD_DecisionForestPredict, TD_XGBoost, TD_XGBoostPredict and TD_RegressionEvaluator SQLE queries • For plotting, predicted values are pulled to client using to_pandas.

Reviewer 2 Suggestions • Have warnings to see if we need to update the notebook. --- Standard followed in CSAE • In section 4, see if we can use teradataml's plot for these 9 plots, without pulling data to client. --- Cannot replace as VARCHAR columns not supported in tdml • In section 5, xgboost is upcoming feature of opensourceML (exposing xgboost through teradataml's OpensourceML). teradataml's OpensourceML also exposes sklearn (upcoming feature). --- Will replace once available in CSAE • For OrdinalEncoding, o use teradataml's SQLE without queries. --- Removed SQL version o Another way is to use opensourceML's sklearn OrdinalEncoding (upcoming feature). • persist only if it is absolutely needed, because TrainTestSplit can be done even without persisting, if DataFrames are used, unlike SQL. • Use teradataml's TrainTestSplit -- Removed SQL version • Use corresponding teradataml's functions for SQLE functions.-- Removed SQL version • Use teradataml's plot functions. ---Used wherever possible