Marijkevandesteene / MachineLearning

repo to share progress and to manage versions of exam MachineLearning (M14)
0 stars 2 forks source link

Amelioration to Evaluate model on test set ; actual vs predicted #53

Closed Marijkevandesteene closed 4 months ago

Marijkevandesteene commented 4 months ago

Calculate for fraction 200/500 or 2/5 of test set in


# To verify what revenue a list of 200 selected on the test set gives with respect to a random sample
# predict the revenue for the selected list of 200/500 clients of test set and compare with the actual revenue of a random sample

# add prediction to test set
test_V2['revenue_pred'] = revenue_best.predict(test_V2.drop(columns=['Id','outcome_revenue'], inplace=False))

# select top 200/500 of test set as our selection / get total predicted revenue and actual revenue for the selection
representative_nbr = int(len(test_V2)*2/5)
selection = test_V2.sort_values('revenue_pred', ascending=False).head(representative_nbr)
total_revenue_pred_selection = selection['revenue_pred'].sum()
total_revenue_selection = selection['outcome_revenue'].sum()

# for a random sample (repeat 100 times) get total predicted revenue and actual revenue
total = 0
total_pred = 0
n_samples = 100
for i in range(100):
    sample = test_V2.sample(representative_nbr)
    total_pred += sample['revenue_pred'].sum()
    total += sample['outcome_revenue'].sum()

total_revenue_pred_sample = total_pred / n_samples
total_revenue_sample = total / n_samples

# Report the actual revenue 
print('The actual revenue of the selection of the test set is %.3f. ' % total_revenue_selection)
print('The predicted revenue of the selection of the test set is %.3f. ' % total_revenue_pred_selection)
print('\nThe mean actual revenue of 100 random samples of the test set is %.3f. ' % total_revenue_sample)
print('The mean predicted revenue of 100 random samples of the test set is %.3f. ' % total_revenue_pred_sample)
print('\nThis means that by applying the selection the model identified your actual gain is %.3f!!! ' % (total_revenue_selection - total_revenue_sample))
dluts commented 4 months ago

It's not clear to me where this should fit in the notebook as 'test_V2' does not exist as a variable in the notebook.

dluts commented 4 months ago

After reading the code, I am going to leave it out because of the limited added value compared the similar calculation already done for the score dataset.