Teradata / jupyter-demos

12 stars 13 forks source link

Product Engineering Review: Energy Consumption Forecasting Dataiku #530

Open ForbiddenDevil opened 4 months ago

ForbiddenDevil commented 4 months ago

Reviewer 1 comments:

Reviewer 1 suggestions:

Reviewer 2 comments:

Reviewer 1 suggestions:

ForbiddenDevil commented 1 month ago

Incorporated the suggested changes. PR: https://github.com/Teradata/jupyter-demos/pull/630

Reviewer 1 comments:

  • In section 1.3

    • Warnings should not be ignored. -- This is a standard used across CSAE usecases
  • In section 3

    • After every PMMLPredict results result_lr, resultrf dataframes are converted to pandas df for plotting can be avoided. -- Used tdplot for the same_
  • In section 4

    • Conversion of dataframe for normalizing and plotting can be avoided. -- Used tdplot for the same
    • Sklearn package mean_squarederror can be avoided. -- Removed the use of sklean package_
    • matplotlib.pyplot can be avoided for plotting.-- Used tdplot for the same

Reviewer 1 suggestions:

  • In section 3

    • We can avoid conversion of data and use tdml dataframe for further processing. -- Used tdml wherever possible
  • In section 4

    • TD_RegressionEvaluator method is alternative tdml function to get evaluation score. -- Used this to get the RMSE values
    • Other normalization can be performed using tdml dataframe. -- Used tdml functions
    • teradataml dataframe supports plotting functionality.-- Used tdplot for the same

Reviewer 2 comments:

  • In section 2.4, train and test data from csv is loaded into vantage using 2 step process of loading csv data into pandas df and then pandas df into vantage table. _-- Python doesn't allow variable names starting with numbers. Hence OrderedDict cannot be created for this dataframe to be passed as types in read_csv()_
  • In section 3, just to print head of result dataframes, teradataml df are converted to pandas df. -- Removed the use of this
  • In section 4, matplotlib is used for simple plotting. -- Used tdplot for the same
  • Results cells are missing. Plots are not visible. -- Plots are only visible when we run the cell. ReadOnly versions of the notebooks are on a separate branch

Reviewer 1 suggestions:

  • In section 2.4, train and test data from csv can be loaded into vantage directly using read_csv() of teradataml.-- Python doesn't allow variable names starting with numbers. Hence OrderedDict cannot be created for this dataframe to be passed as types in read_csv()_
  • In section3, to print head of result dataframes(result_lr, result_rf), directly use head() of teradataml. No need to convert it to pandas df. -- Used tdml here
  • In section 4, teradataml plots can be used instead of matplotlib. -- Used tdplot for the same
  • Recompile notebook to generate results and plots. -- Results are not supposed to be saved