Feedback from classmates

alyciakb commented 5 years ago

Use a proportional feature graph to show the difference in the feature compare scores/rankings to show a better story about the importance/predictive quality of each feature.
Breaking up dplyr code in data cleaning with comments: does it make readability more difficult?
Use case_when instead of using nested if_else statements
Prediction approach/methods section in the final report that lays out the whole plan of analysis attack in a few quick bullet points.
Cross validation: the test data we shouldn't use in cross validation, only the training data because CV is only used to pick a max depth. As it stands right now, our test data is influencing our fit because we used the test data depth score, not the training data. The cross validation function breaks the training data into train and validation groups within the function, we don't pass any test data into it. We need to remove this line and see if it influences max depth and pick max depth based on the training data CV score. Code is in script file 3:

    depth_range = range(1,10)
    train_cv = []
    test_cv = []
    for d in depth_range:
        model = DecisionTreeClassifier(max_depth=d)
        train_cv.append(np.mean(cross_val_score(model, X_train, y_train, cv=10)))
        --test_cv.append(np.mean(cross_val_score(model, X_test, y_test, cv=10)))-- REMOVE
    max_cv = max(train_cv)
    opt_d = train_cv.index(max_cv)

Feedback from Tony

alyciakb commented 5 years ago

✅ Cross Validation code updated in script 3

alyciakb commented 5 years ago

Did not add a feature graph (point 1), instead we added gini scores to our table and an interpretation explaining the meaning and importance of them.
Fixed some readability issues

UBC-MDS / DSCI_522_Alberta-Oil-Spills

Feedback from classmates #27