Marijkevandesteene / MachineLearning

repo to share progress and to manage versions of exam MachineLearning (M14)
0 stars 2 forks source link

Export the predictions for all instances in score to csv #51

Closed Marijkevandesteene closed 2 days ago

Marijkevandesteene commented 3 days ago

To export all instances please add under '# Selection of final list of guests':


# delete existing columns if they exist (if rerun is needed)
if 'revenue_pred' in score.columns:
    score.drop(columns=['revenue_pred'], inplace=True)
if 'profit_pred' in score.columns:
    score.drop(columns=['profit_pred'], inplace=True)
if 'damage_pred' in score.columns:
    score.drop(columns=['damage_pred'], inplace=True)
if 'damage_inc_pred' in score.columns:
    score.drop(columns=['damage_inc_pred'], inplace=True)

score_without_targetfeatures = score.copy().drop(columns=['Id'])
#column 'Id' may give problems - if it was removed before you do not have to add previous row

# -- predict the revenue according to the best = the selected model
score['revenue_pred'] = model_revenue.predict(score_without_targetfeatures)
score['profit_pred'] = model_profit.predict(score_without_targetfeatures)
score['damage_pred'] = model_damage.predict(score_without_targetfeatures)
score['damage_inc_pred'] = model_damage_inc.predict(score_without_targetfeatures)

# -- predict the revenue for the selected list of 200 clients with the revenue of a random sample
selection = score.sort_values('revenue_pred', ascending=False).head(200)

# -- compare this selection to a random selection from the predicted scores.
total = 0
n_samples = 100
for i in range(100):
    sample = score.sample(200)
    total += sample['revenue_pred'].sum()

total_revenue_pred_selection = selection['revenue_pred'].sum()
total_revenue_pred_sample = total / n_samples

print('The estimated revenue of the selection is %.3f, ' % total_revenue_pred_selection)
print('The estimated revenue of the random sample is %.3f, ' % total_revenue_pred_sample)
print('This means that by applying the selection the model predicts a gain of %.3f!!! ' % (total_revenue_pred_selection - total_revenue_pred_sample))
#print('Train R2: %.3f' % gbm.score(X=X_train, y=y_train))

# -- save all predictions of the selected guests to the output folder
selection.to_csv(os.path.join('output', 'selected_guests.csv'))
Marijkevandesteene commented 3 days ago

All in txt file attached, formatting was lost SaveSelectionList.txt

dluts commented 3 days ago

Integrated in final

dluts commented 3 days ago

Ik kan blijkbaar zelf het issue niet sluiten