Suggestion on papers about DL & MIR

ybayle commented 6 years ago

Hi, I might have interesting articles for you to add. It's a list of 34 articles using deep learning methods applied to MIR. The list is in form of a bib file where the field code points to the website hosting the source code (mostly in Python) for the algorithm described in each paper. It comes from here.

@unpublished{Choi2017a, author = {Choi, Keunwoo and Fazekas, György and Cho, Kyunghyun and Sandler, Mark Brian}, code = {https://github.com/keunwoochoi/dl4mir}, journal = {arXiv preprint arXiv:1709.04396}, link = {https://arxiv.org/pdf/1709.04396.pdf}, title = {A tutorial on deep learning for music information retrieval}, year = {2017} }

@unpublished{Choi2017b, author = {Choi, Keunwoo and Fazekas, György and Cho, Kyunghyun and Sandler, Mark Brian}, code = {https://github.com/keunwoochoi/transfer_learning_music}, dataset = {MSD}, journal = {arXiv preprint arXiv:1709.01922}, link = {https://arxiv.org/pdf/1709.01922.pdf}, task = {MGR}, title = {A comparison on audio signal preprocessing methods for deep neural networks on music tagging}, year = {2017} }

@inproceedings{Choi2017c, author = {Choi, Keunwoo and Fazekas, György and Sandler, Mark Brian and Cho, Kyunghyun}, booktitle = {ISMIR}, code = {https://github.com/keunwoochoi/transfer_learning_music}, link = {https://arxiv.org/pdf/1703.09179v3.pdf}, title = {Transfer learning for music classification and regression tasks}, year = {2017} }

@inproceedings{Choi2017d, architecture = {CRNN}, author = {Choi, Keunwoo and Fazekas, György and Sandler, Mark Brian and Cho, Kyunghyun}, booktitle = {ICASSP}, code = {https://github.com/keunwoochoi/icassp_2017}, link = {http://ieeexplore.ieee.org/abstract/document/7952585/}, organization = {IEEE}, pages = {2392--2396}, reproducible = {Models & split sets only}, task = {MGR}, title = {Convolutional recurrent neural networks for music classification}, year = {2017} }

@inproceedings{Gong2017, author = {Gong, Rong and Pons, Jordi and Serra, Xavier}, booktitle = {ISMIR}, code = {https://github.com/ronggong/jingjuSingingPhraseMatching/tree/v0.1.0}, link = {https://arxiv.org/pdf/1707.03547.pdf}, title = {Audio to score matching by combining phonetic and duration information}, year = {2017} }

@inproceedings{Hadjeres2016d, author = {Hadjeres, Gaëtan and Pachet, François}, booktitle = {ICML}, code = {https://github.com/Ghadjeres/DeepBach}, link = {https://arxiv.org/pdf/1612.01010.pdf}, title = {DeepBach: A steerable model for Bach chorales generation}, year = {2016} }

@article{Kereliuk2015b, architecture = {CNN}, author = {Kereliuk, Corey and Sturm, Bob L. and Larsen, Jan}, code = {https://github.com/coreyker/dnn-mgr}, dataset = {GTzan & LMD}, input = {Magnitude spectral frames}, journal = {IEEE_TMM}, link = {https://arxiv.org/pdf/1507.04761.pdf}, number = {11}, pages = {2059--2071}, publisher = {IEEE}, task = {MGR}, title = {Deep learning and music adversaries}, volume = {17}, year = {2015} }

@unpublished{Lee2017c, author = {Lee, Jongpil and Nam, Juhan}, code = {https://github.com/jongpillee/musicTagging_MSD}, dataset = {MSD}, journal = {arXiv preprint arXiv:1706.06810}, link = {https://arxiv.org/pdf/1706.06810.pdf}, title = {Multi-level and multi-scale feature aggregation using sample-level deep convolutional neural networks for music classification}, year = {2017} }

@inproceedings{Liu2016, architecture = {CNN}, author = {Liu, Jen-Yu and Yang, Yi-Hsuan}, booktitle = {Proceedings of the 2016 ACM on Multimedia Conference}, code = {https://github.com/ciaua/clip2frame}, link = {http://mac.citi.sinica.edu.tw/~yang/pub/liu16mm.pdf}, organization = {ACM}, pages = {1048--1057}, title = {Event localization in music auto-tagging}, year = {2016} }

@inproceedings{Lostanlen2016, author = {Lostanlen, Vincent and Cella, Carmine-Emanuele}, booktitle = {ISMIR}, code = {https://github.com/lostanlen/ismir2016}, link = {https://github.com/lostanlen/ismir2016/blob/master/paper/lostanlen_ismir2016.pdf}, task = {Instrument recognition}, title = {Deep convolutional networks on the pitch spiral for musical instrument recognition}, year = {2016} }

@inproceedings{Mehri2017, architecture = {RNN}, author = {Mehri, Soroush and Kumar, Kundan and Gulrajani, Ishaan and Kumar, Rithesh and Jain, Shubham and Sotelo, Jose and Courville, Aaron and Bengio, Yoshua}, booktitle = {ICLR}, code = {https://github.com/soroushmehr/sampleRNN_ICLR2017}, dataset = {32 Beethoven’s piano sonatas gathered from https://archive.org}, link = {https://openreview.net/pdf?id=SkxKPDv5xl}, note = {https://arxiv.org/pdf/1612.07837.pdf}, task = {Composition}, title = {SampleRNN: An unconditional end-to-end neural audio generation model}, year = {2016} }

@inproceedings{Miron2017a, address = {Espoo, Finland}, architecture = {CNN}, author = {Miron, Marius and Janer Mestres, Jordi and G{\'o}mez Guti{\'e}rrez, Emilia}, booktitle = {SMC}, code = {https://github.com/MTG/DeepConvSep}, dataset = {RWC & Bach10}, link = {https://www.researchgate.net/profile/Marius_Miron/publication/318322107_Generating_data_to_train_convolutional_neural_networks_for_classical_music_source_separation/links/59637cc3458515a3575b93c6/Generating-data-to-train-convolutional-neural-networks-for-classical-music-source-separation.pdf?_iepl%5BhomeFeedViewId%5D=WchoMnlUL1Hk9hBLVTeR8Amh&_iepl%5Bcontexts%5D%5B0%5D=pcfhf&_iepl%5BinteractionType%5D=publicationDownload&origin=publication_detail&ev=pub_int_prw_xdl&msrp=p3lQ8M4uZlb4TF5Hv9a2U3P2y4wW7ant5KWj4E5-OcD1Mg53p1ykTKHMG9_zVTB9n6mI8fvZOCL2Xhpru186pCEY-2ZxiYR-CB8_QvwHc1kUG-QE4SHdProR.LoJb2BDOiiQth3iR9xgZUxxCWEJgtTBF4whFrFa01OD49-3YYRxA0WQVN--zhtQU_7C2Pt0rKdwoFxT1pfxFvnKXSXmy2eT1Jpz-pw.U1QLoFO_Uc6aQVr2Nm2FcAi6BqAUfngH2Or5__6wegbCgVvTYoIGt22tmCkYbGTOQ_4PxBgt1LrvsFQiL0oMyogP8Yk8myTj0gs9jw.fGpkufGqAI4R2v8Hfe0ThcXL7M7yN2PuAlx974BGVn50SdUWvNhhIPWBD-zWTn8NKtVJx3XrjKXFrMgi9Cx7qGrNP8tBWpha6Srf6g}, month = {Jul.}, organization = {Aalto University}, pages = {227}, task = {Source separation}, title = {Generating data to train convolutional neural networks for classical music source separation}, year = {2017} }

@inproceedings{Miron2017b, address = {Suzhou, China}, architecture = {CNN}, author = {Miron, Marius and Janer, Jordi and G{\'o}mez, Emilia}, booktitle = {ISMIR}, code = {https://github.com/MTG/DeepConvSep}, dataset = {Bach10}, link = {https://www.researchgate.net/profile/Marius_Miron/publication/318637038_Monaural_score-informed_source_separation_for_classical_music_using_convolutional_neural_networks/links/597327c6458515e26dfdb007/Monaural-score-informed-source-separation-for-classical-music-using-convolutional-neural-networks.pdf?_iepl%5BhomeFeedViewId%5D=WchoMnlUL1Hk9hBLVTeR8Amh&_iepl%5Bcontexts%5D%5B0%5D=pcfhf&_iepl%5BinteractionType%5D=publicationDownload&origin=publication_detail&ev=pub_int_prw_xdl&msrp=Hp6dDqMepEiRZ5E6WkreaqyjFkFkwMxPFoJvr14etVJsoKZBc5qb99fBnJjVUZrRHLFRhaXvNY9k1sMvYPOouuGbQP0YhEGm28zLw_55Zewu86WGnHck1Tqi.93HH2WqXfTedn6IaZRjjhQGYZVDHBz1X6nr4ABBgMAVv584gvGN3sW5IyBAY-4MBWf5DJFPBGm8zsaC2dKz8G-odZPfosWoXY0afAQ.KoCP2mO9l31lCER0oMZMZBrbuRGvb6ZzeBwHb88pL8AhMfJk03Hj1eLrohQIjPDETBj4hhqb0gniDGJgtZ9GnW64ZNjh9GbQDrIl5A.egNQTyC7t8P26zCQWrbEhf51Pxy2JRBZoTkH6SpRHHhRhFl1_AT_AT481lMcFI34-JbeRq-5oTQR7DpvAuw7iUIivd78ltuxpI9syg}, task = {Source separation}, title = {Monaural score-informed source separation for classical music using convolutional neural networks}, year = {2017} }

@inproceedings{Oramas2017b, architecture = {CNN}, author = {Oramas, Sergio and Nieto, Oriol and Barbieri, Francesco and Serra, Xavier}, booktitle = {ISMIR}, code = {https://github.com/sergiooramas/tartarus}, dataset = {MSD}, link = {https://arxiv.org/abs/1707.04916}, task = {MGR}, title = {Multi-label music genre classification from audio, text, and images using deep features}, year = {2017} }

@inproceedings{Oramas2017a, architecture = {CNN}, author = {Oramas, Sergio and Nieto, Oriol and Sordo, Mohamed and Serra, Xavier}, booktitle = {DLRS}, code = {https://github.com/sergiooramas/tartarus}, dataset = {MSD}, link = {https://arxiv.org/pdf/1706.09739.pdf}, note = {http://dlrs-workshop.org/wp-content/uploads/2017/09/oramas_music_cold_start_dlrs2017.pdf}, task = {Recommendation}, title = {A deep multimodal approach for cold-start music recommendation}, year = {2017} }

@inproceedings{Pfalz2017, author = {Pfalz, A and Berdahl, E}, booktitle = {IWDLM}, code = {https://www.cct.lsu.edu/~apfalz/inverse_control.html}, link = {http://dorienherremans.com/dlm2017/papers/pfalz2017synthesis.pdf}, title = {Toward inverse control of physics-based sound synthesis}, year = {2017} }

@inproceedings{Pons2017a, author = {Pons, Jordi and Gong, Rong and Serra, Xavier}, booktitle = {ISMIR}, code = {https://github.com/ronggong/jingjuSyllabicSegmentaion/tree/v0.1.0}, link = {https://arxiv.org/pdf/1707.03544.pdf}, task = {Syllable segmentation}, title = {Score-informed syllable segmentation for a cappella singing voice with convolutional neural networks}, year = {2017} }

@inproceedings{Pons2016, address = {Bucharest, Romania}, author = {Pons, Jordi and Lidy, Thomas and Serra, Xavier}, booktitle = {CBMI}, code = {https://github.com/jordipons/}, dataset = {Ballroom}, doi = {10.1109/CBMI.2016.7500246}, isbn = {978-1-4673-8695-1}, link = {http://jordipons.me/media/CBMI16.pdf}, month = {Jun.}, pdf = {http://publik.tuwien.ac.at/files/publik_255991.pdf}, researchfields = {MIR}, title = {Experimenting with musically motivated convolutional neural networks}, year = {2016} }

@inproceedings{Pons2017c, address = {New Orleans, USA}, architecture = {CNN}, author = {Pons, Jordi and Serra, Xavier}, booktitle = {ICASSP}, code = {https://github.com/jordipons/ICASSP2017}, dataset = {Ballroom}, link = {http://ieeexplore.ieee.org/document/7952601/}, task = {MGR}, title = {Designing efficient architectures for modeling temporal features with convolutional neural networks}, year = {2017} }

@inproceedings{Pons2017b, architecture = {CNN}, author = {Pons, Jordi and Slizovskaia, Olga and Gong, Rong and G{\'o}mez, Emilia and Serra, Xavier}, booktitle = {EUSIPCO}, code = {https://github.com/jordipons/EUSIPCO2017}, link = {https://github.com/ronggong/EUSIPCO2017}, title = {Timbre analysis of music audio signals with convolutional neural networks}, year = {2017} }

@inproceedings{Chandna2017, architecture = {CNN}, author = {Pritish Chandna and Marius Miron and Jordi Janer and Emilia Gomez}, booktitle = {International Conference on Latent Variable Analysis and Signal Separation (LVA ICA)}, code = {https://github.com/MTG/DeepConvSep}, dataset = {DSD100}, keyword = {deep learning, neural networks, source separation}, link = {http://mtg.upf.edu/system/files/publications/monoaural-audio-source_0.pdf}, month = {Feb.}, task = {Source separation}, title = {Monoaural audio source separation using deep convolutional neural networks}, year = {2017} }

@inproceedings{Roma2016, author = {Roma, Gerard and Grais, Emad M and Simpson, Andrew JR and Plumbley, Mark D}, booktitle = {MIREX}, code = {http://cvssp.org/projects/maruss/mirex2016/}, dataset = {iKala}, link = {http://www.music-ir.org/mirex/abstracts/2016/RSGP1.pdf}, task = {SVS}, title = {Singing voice separation using deep neural networks and F0 estimation}, year = {2016} }

@inproceedings{Schluter2015, architecture = {CNN}, author = {Schl{\"u}ter, Jan and Grill, Thomas}, booktitle = {ISMIR}, code = {https://github.com/f0k/ismir2015}, dataset = {Inhouse & Jamendo & RWC}, input = {Spectrogram}, link = {https://grrrr.org/pub/schlueter-2015-ismir.pdf}, pages = {121--126}, task = {SVD}, title = {Exploring data augmentation for improved singing voice detection with neural networks}, year = {2015} }

@inproceedings{Southall2017, architecture = {CNN & BRNN}, author = {Southall, Carl and Stables, Ryan and Hockman, Jason}, booktitle = {ISMIR}, code = {https://github.com/CarlSouthall/ADTLib}, dataset = {IDMT-SMT-Drums}, link = {http://www.ryanstables.co.uk/docs/ISMIR2017CamReady.pdf}, task = {Transcription}, title = {Automatic drum transcription for polyphonic recordings using soft attention mechanisms and convolutional neural networks}, year = {2017} }

@article{Sturm2017, author = {Sturm, Bob L. and Ben-Tal, Oded}, code = {https://github.com/IraKorshunova/folk-rnn}, journal = {JCMS}, link = {http://jcms.org.uk/issues/Vol2Issue1/taking-models-back-to-music-practice/Taking%20the%20Models%20back%20to%20Music%20Practice:%20Evaluating%20Generative%20Transcription%20Models%20built%20using%20Deep%20Learning.pdf}, number = {1}, publisher = {University of Huddersfield Press}, title = {Taking the models back to music practice: Evaluating generative transcription models built using deep learning}, volume = {2}, year = {2017} }

@inproceedings{Sturm2016, author = {Sturm, Bob L. and Santos, Joao Felipe and Ben-Tal, Oded and Korshunova, Iryna}, booktitle = {CSMC}, code = {https://github.com/IraKorshunova/folk-rnn}, link = {https://drive.google.com/file/d/0B1OooSxEtl0FcTBiOGdvSTBmWnc/view}, title = {Music transcription modelling and composition using deep learning}, year = {2016} }

@inproceedings{Sturm2015, author = {Sturm, Bob L. and Santos, Joao Felipe and Korshunova, Iryna}, booktitle = {ISMIR}, code = {https://github.com/IraKorshunova/folk-rnn}, link = {http://ismir2015.uma.es/LBD/LBD13.pdf}, task = {Composition}, title = {Folk music style modelling by recurrent neural networks with long short term memory units}, year = {2015} }

@unpublished{Takahashi2016, architecture = {CNN}, author = {Takahashi, Naoya and Gygli, Michael and Pfister, Beat and Van Gool, Luc}, code = {https://bitbucket.org/naoya1/aenet_release}, dataaugmentation = {Mixing}, dataset = {Acoustic Event}, journal = {arXiv preprint arXiv:1604.07160}, link = {https://arxiv.org/pdf/1604.07160.pdf}, task = {Event recognition}, title = {Deep convolutional neural networks and data augmentation for acoustic event detection}, year = {2016} }

@inproceedings{Tsaptsinos2017, address = {Suzhou, China}, architecture = {HAN}, author = {Tsaptsinos, Alexandros}, backend = {Tensorflow}, booktitle = {ISMIR}, code = {https://github.com/alexTsaptsinos/lyricsHAN}, dataset = {LyricFind}, link = {https://ismir2017.smcnus.org/wp-content/uploads/2017/10/43_Paper.pdf}, loss = {cross-entropy}, pages = {694--701}, task = {MGR}, title = {Lyrics-based music genre classification using a hierarchical attention network}, year = {2017} }

@unpublished{Valin2017, architecture = {RNN}, author = {Valin, Jean-Marc}, code = {https://github.com/xiph/rnnoise/}, dataaugmentation = {No}, dataset = {TSP & NTT MLS}, input = {BFCC (22), 1st and 2nd derivatives of first 6 BFCCs, 6 coefficients of DCT of pitch correlation, pitch period, spectral non-stationary metric}, journal = {arXiv preprint arXiv:1709.08243}, layers = {4}, link = {https://arxiv.org/pdf/1709.08243.pdf}, loss = {Custom}, task = {Noise suppression}, title = {A hybrid DSP/deep learning approach to real-time full-band speech enhancement}, year = {2017} }

@inproceedings{Wyse2017, architecture = {CNN}, author = {Wyse, Lonce}, booktitle = {IWDLM}, code = {http://lonce.org/research/audioST/}, link = {http://dorienherremans.com/dlm2017/papers/wyse2017spect.pdf}, title = {Audio spectrogram representations for processing with convolutional neural networks}, year = {2017} }

@techreport{Xu2017b, author = {Xu, Yong and Kong, Qiuqiang and Wang, Wenwu and Plumbley, Mark D}, code = {https://github.com/yongxuUSTC/dcase2017_task4_cvssp}, link = {https://www.cs.tut.fi/sgn/arg/dcase2017/documents/challenge_technical_reports/DCASE2017_Xu_146.pdf}, task = {Event recognition}, title = {Surrey-CVSSP system for DCASE2017 challenge task4}, year = {2017} }

@inproceedings{Ycart2017, architecture = {LSTM}, author = {Ycart, Adrien and Benetos, Emmanouil}, code = {http://www.eecs.qmul.ac.uk/~ay304/code/ismir17}, dataaugmentation = {Pitch shift}, dataset = {Inhouse & Piano-midi.de}, link = {https://qmro.qmul.ac.uk/xmlui/handle/123456789/24946}, organization = {ISMIR}, task = {Polyphonic music sequence modelling}, title = {A study on LSTM networks for polyphonic music sequence modelling}, year = {2017} }

@inproceedings{Xu2017a, architecture = {CRNN}, author = {Yong Xu and Qiuqiang Kong and Qiang Huang and Wenwu Wang and Mark D. Plumbley}, booktitle = {INTERSPEECH}, code = {https://github.com/yongxuUSTC/att_loc_cgrnn}, doi = {10.21437/Interspeech.2017-486}, link = {https://arxiv.org/pdf/1703.06052.pdf}, note = {https://sites.google.com/view/xuyong/demos/attention_model}, pages = {3083--3087}, task = {DCASE 2016 Task 4 Domestic audio tagging}, title = {Attention and localization based on a deep convolutional recurrent model for weakly supervised audio tagging}, year = {2017} }

faroit commented 6 years ago

thanks for the input. I've followed your repo, so I know whats going on ;-) I think before I would add these papers I would have to add some basic requirements for this list. Basically it would be:

freely available implementation (ideally including paper quality plots)
freely available dataset
runs on anything less than a GPU cluster ;-)

what do you think?

ybayle commented 6 years ago

The requirements are a good idea!

freely available implementation (ideally including paper quality plots)

Of course, I think this one is mandatory.

freely available dataset

I agree with that. Two articles use the RWC dataset (cf mentioned in the dataset field) and some articles use an inhouse dataset for tuning their algorithms. These articles are worth mentioning but indeed are harder to reproduce than other articles that use a freely available dataset. At this point it depend on you and the purpose you want to give to your repo. If you want to list all reproducible audio research, maybe it would be great to have another section entitled "Intricate reproducible experiments"? Otherwise, if you just want to list easily repeatable MIR experiments for some students to reproduce, it is better to remove experiments that require freely available dataset.

runs on anything less than a GPU cluster ;-)

The comment on this point is the same as above: depending on your main idea for this repo you can have a specific section or just ignore these articles.

faroit / reproducible-audio-research

Suggestion on papers about DL & MIR #7