Questions about data set partitioning

sanguiyh commented 1 year ago

In the code, Kflod cross-validation divides the data set. Each time the code is divided into test.pkl and train.pkl, will it cause all the data to be used as the training set? Won't the result be too high? Kfold cross-validation should divide the data set into a training set and a test set. The training set is divided into k parts. During each cross, one part will be used as the validation set, and the rest will be used as the training set. The test set will never be used. That's right for training. The code randomly divides the complete data set into k parts, causing each test data to be trained.

jinmingzhong commented 1 year ago

In 5-fold cross-validation, classifier is initialized for each validation.

sanguiyh commented 1 year ago

In 5-fold cross-validation, classifier is initialized for each validation.

Got it, thank you very much

CGCL-codes / VulCNN

Questions about data set partitioning #17