Marijkevandesteene / MachineLearning

repo to share progress and to manage versions of exam MachineLearning (M14)
0 stars 2 forks source link

Dealing with the missing data, dropping instances with more than 35 missing features. #39

Closed dluts closed 4 months ago

dluts commented 4 months ago

It is nog cleare to me how come to this list of column indices and where the 24 comes from: instances_missingsData = train_V2[train_V2.loc[:,['company_ic','claims_no','income_am','gold_status','nights_booked','gender','shop_am','retired','fam_adult_size','children_no','divorce','profit_last_am','sport_ic','crd_lim_rec','credit_use_ic','gluten_ic','lactose_ic','insurance_ic','prev_all_in_stay','profit_am','bar_no','age','marketing_permit','urban_ic']].isnull().sum(axis=1) == 24]

dluts commented 4 months ago

This comment below this part is also confusing for me: `After dropping these instances with more than 35 missing feature, we see that there is data for the following features (other features are NaN):

Marijkevandesteene commented 4 months ago

You are right some explanation is needed to understand this. In working with it it was clear to me what I tried to do. I will add extra information to explain how I identified these features and what I want to tell / explain with that data. Check out in the consolidated notebook after next checkin

(+ referred to this ticket)

dluts commented 4 months ago

@Marijkevandesteene I see that you have added more information, but unfortunately it did not clarify further how the analysis has been done and how it all fits together. That is in part because the code used to do the analysis also seems to be missing? image

Marijkevandesteene commented 4 months ago

As discussed via whatsapp /teams: was investigated / explored during data preparation left over exploration - to be removed from final notebook.

dluts commented 4 months ago

Done