CGCL-codes / VulCNN

66 stars 14 forks source link

Questions about data set partitioning #17

Open sanguiyh opened 1 year ago

sanguiyh commented 1 year ago

image

In the code, Kflod cross-validation divides the data set. Each time the code is divided into test.pkl and train.pkl, will it cause all the data to be used as the training set? Won't the result be too high? Kfold cross-validation should divide the data set into a training set and a test set. The training set is divided into k parts. During each cross, one part will be used as the validation set, and the rest will be used as the training set. The test set will never be used. That's right for training. The code randomly divides the complete data set into k parts, causing each test data to be trained.

jinmingzhong commented 1 year ago

image In 5-fold cross-validation, classifier is initialized for each validation.

sanguiyh commented 1 year ago

image In 5-fold cross-validation, classifier is initialized for each validation.

Got it, thank you very much