Hguimaraes / gtzan.keras

[REPO] Music Genre classification on GTZAN dataset using CNNs
MIT License
198 stars 57 forks source link

module version #3

Closed bagustris closed 6 years ago

bagustris commented 6 years ago

Could you mention the version of python module you used for this repo (scikit-learn, tensorflow, keras, etc...?)

Hguimaraes commented 6 years ago

Hi @bagustris!

Sorry to take so long to answer you. This is a great question and I'm ashamed to not know the answer. I remember using the latest version by the time of the last commits, but I'm not sure what version is. I will dig up a little more on this and tell you here.

With the current versions of tensorflow/keras I can't reproduce the same results of this repo, so it's also important for me to discover that. I'm returning for this project now in July and any updates on this you will be notified!

Cheers,

TE-BibhudenduMohapatra commented 6 years ago

Hello @Hguimaraes What are the current numbers you are seeing? Mine are as follows: Validation accuracy - mean: 0.6049333333651224, std: 0.04443942198259471 Test accuracy - mean: 0.5725999999999999, std: 0.032432082880999163 Test accuracy MVS - mean: 0.644, std: 0.02727636339397174

Hguimaraes commented 6 years ago

Hi @TE-BibhudenduMohapatra ,

I was seeing numbers like this I think. So, I created a new branch called "Refactor" where I'm restructuring the project. I'm also using 2D convs now. There I'm documenting the versions and everything else. You can take a look here: https://github.com/Hguimaraes/gtzan.keras/blob/refactor/nbs/classification_deeplearning.ipynb

Soon as I get a stable version I will code and push to the master branch. (Not this week)

In this new branch I'm using:

  1. numpy==1.14.3
  2. tensorflow==1.9.0
  3. Keras==2.2.2
  4. librosa==0.6.1

and others. You can check on requirements.txt

Cheers,

EDIT: This notebook with 2D convs was tested on GCP, using the deep learning image in an K40 GPU instance.

TE-BibhudenduMohapatra commented 6 years ago

Hi @Hguimaraes,

Thanks for your reply. Did you add more files your dataset? I saw 13300 samples for training. I think it was 100 files per genre originally. Your numbers in the refactor branch look good!! I have used the same data-set and 2D-convolution and I see validation(75%) whereas training accuracy is 93%. If I decrease the number of genres, validation improves to around 85%. I think my model is over-fitting. I look forward to trying out your master branch when its ready. Thanks!!

Hguimaraes commented 6 years ago

@TE-BibhudenduMohapatra,

I didn't add new files in this branch, I just splited the songs into windows of 3 seconds. You can think of this as an augmentation process. The choice of 3s is based on this paper.

For instance, a window with 0 overlapping can give us 10x more data. I'm using 50% overlapping in the windows (First frame goes from 0s to 3s. Second frame goes from 1.5s to 4.5s...) and with this we can have 19x data...

I'm still think about this methodology and I need to investigate more to see if I'm doing something wrong.

EDIT: Take a look on this paper too: https://arxiv.org/abs/1712.08370

TE-BibhudenduMohapatra commented 6 years ago

Thanks you for your explanation and sharing the paper about cnn and rnn. I read about another approach which was related to cover art.

TE-BibhudenduMohapatra commented 6 years ago

Hi @Hguimaraes Have you considered using batch normalization instead of dropout at every stage in your cnn? Here's something I came across. https://towardsdatascience.com/dont-use-dropout-in-convolutional-networks-81486c823c16

Hguimaraes commented 6 years ago

Is a good point @TE-BibhudenduMohapatra ... I always have heard that was not harmful but BN is better as you said. I will give a try to BN in my next experiment.

I working on the dataset split after my "augmentation", I think maybe is not right. I need to shuffle the music instead of shuffle all pieces...

When I get some results I tell you here.

Hguimaraes commented 6 years ago

@TE-BibhudenduMohapatra I tried BN and didn't improve so much (actually was a poor result, I have some theory to explain but I'm not quite sure)... Something new I tried was to use a pre-trained model on the imageNet and freeze some of the initial layers. The VGG16 gave me a good result.

If you can, check the nbs folder. Everything is on the master branch now. I will close this issue, but you can talk to me about the project if you want, it would be very good! (-:

Cheers,

TE-BibhudenduMohapatra commented 6 years ago

Hi @Hguimaraes Sharing something I was reading and it has reference to VGG16. https://www.kdnuggets.com/2018/09/dropout-convolutional-networks.html I will check the nbs folder. Let's stay in touch about the project!