Marijkevandesteene / MachineLearning

repo to share progress and to manage versions of exam MachineLearning (M14)
0 stars 2 forks source link

determine score5_neg_uniform for score data #22

Closed Marijkevandesteene closed 5 months ago

Marijkevandesteene commented 5 months ago

In the notebook a method is applied to score5_neg train_V2["score5_neg_uniform"] = train_V2["score5_neg"].rank(method='max', pct=True) How can we apply that to the score dataframe? Is it correct to rank it only using the score in the score dataframe?

binomaiheu commented 5 months ago

Thats a very good point actually, probably using an empiricatl cumulataive distribution function for rescaling would be better, in which we determine the ecdf on the input data & also apply on the score data ? I originally had this in, but then changed it for this simpler approach, but you are right i think that this can't be applied then to the score. I'll adjust the Integrated notebook accordingly.