Closed cfkstat closed 3 months ago
Thanks for this comment. But I do not fully understand this, does you mean splitting samples across time like sklearn.model_selection.TimeSeriesSplit
?
It's similar, but not exactly the same. For example, to develop loan application score, I use loan credit 202204 to 202208 as the training set, and 202209 to 202210 as the valid set. It is necessary to optimize the AUC of the training set and the valid set, and it cannot be overfitted. The gap between the training and valid AUC is less than or equal to 2%, and the gap between KS is less than or equal to 3%.
Based on my understanding, the difference from sklearn.model_selection.TimeSeriesSplit is that you want to control the AUC of the validation set and training set within a certain range (e.g., 2%), is that correct?
Maximize the AUC Score of the model training set and validation set, while ensuring that the difference between the two AUCs is less than 0.02, or the difference between KS indicators is less than 3%. It should be noted that the training set and validation set are split across time, such as the loan month.