Closed dluts closed 4 months ago
This comment below this part is also confusing for me: `After dropping these instances with more than 35 missing feature, we see that there is data for the following features (other features are NaN):
spa_ic
is 23x = 1 (2/21: outcome_damage_ic = 1)empl_ic
is 52x = 0 (12/40: outcome_damage_ic = 1)married_cd
is 53x = false (12/53: outcome_damage_ic = 1)claims_am
is 26x = 0 (6/20: outcome_damage_ic = 1)spa_ic
& claims_am
& empl_ic
has data for 6 of these instances `You are right some explanation is needed to understand this. In working with it it was clear to me what I tried to do. I will add extra information to explain how I identified these features and what I want to tell / explain with that data. Check out in the consolidated notebook after next checkin
(+ referred to this ticket)
@Marijkevandesteene I see that you have added more information, but unfortunately it did not clarify further how the analysis has been done and how it all fits together. That is in part because the code used to do the analysis also seems to be missing?
As discussed via whatsapp /teams: was investigated / explored during data preparation left over exploration - to be removed from final notebook.
Done
It is nog cleare to me how come to this list of column indices and where the 24 comes from:
instances_missingsData = train_V2[train_V2.loc[:,['company_ic','claims_no','income_am','gold_status','nights_booked','gender','shop_am','retired','fam_adult_size','children_no','divorce','profit_last_am','sport_ic','crd_lim_rec','credit_use_ic','gluten_ic','lactose_ic','insurance_ic','prev_all_in_stay','profit_am','bar_no','age','marketing_permit','urban_ic']].isnull().sum(axis=1) == 24]