Closed Marijkevandesteene closed 5 months ago
Thats a very good point actually, probably using an empiricatl cumulataive distribution function for rescaling would be better, in which we determine the ecdf on the input data & also apply on the score data ? I originally had this in, but then changed it for this simpler approach, but you are right i think that this can't be applied then to the score. I'll adjust the Integrated notebook accordingly.
In the notebook a method is applied to score5_neg train_V2["score5_neg_uniform"] = train_V2["score5_neg"].rank(method='max', pct=True) How can we apply that to the score dataframe? Is it correct to rank it only using the score in the score dataframe?