Maybe leakage in your method of splitting dataset

NazarPonochevnyi / Trained-CNN-for-Genre-classification

🎵 Trained CNN model for Genre classification on GTZAN dataset [CNN Model: https://github.com/Hguimaraes/gtzan.keras]

MIT License

21 stars 5 forks source link

Maybe leakage in your method of splitting dataset #5

Closed woshikafei closed 4 years ago

woshikafei commented 4 years ago

You have overlap 50% in your data augment, and split a song into several fragments while they were splitted into both train and validation datasets. Besides, VGG16 would have a large Receptive Field. So, maybe your validation dataset have seem itself when training.

NazarPonochevnyi commented 4 years ago

Yeah, it can be that two different images in train and test datasets can have 50% of a similar view. I agree that is not the correct way to calculate accuracy on the test dataset. Unfortunately, we are forced to resort to this method due to small size of the GTZAN dataset.

woshikafei commented 4 years ago

Yeah, it can be that two different images in train and test datasets can have 50% of a similar view. I agree that is not the correct way to calculate accuracy on the test dataset. Unfortunately, we are forced to resort to this method due to small size of the GTZAN dataset.

got it! Not a precise score but a good method! Thanks.

NazarPonochevnyi commented 4 years ago

Your welcome!