XinhaoMei / audio-text_retrieval

Implementation of our paper 'On Metric Learning For Audio-Text Cross-Modal Retrieval'
43 stars 5 forks source link

On Metric Learning for Audio-Text Cross-Modal Retrieval

Set up environment

Set up dataset

Pre-trained encoders

Run experiments

Cite

If you use our code, please kindly cite following:

@article{Mei2022metric,
  title = {On Metric Learning for Audio-Text Cross-Modal Retrieval},
  author = {Mei, Xinhao and Liu, Xubo and Sun, Jianyuan and Plumbley, Mark D. and Wang, Wenwu},
  journal={arXiv preprint arXiv:2203.15537},
  year={2022}
}

and

@inproceedings{Mei2021ACT,
    author = "Mei, Xinhao and Liu, Xubo and Huang, Qiushi and Plumbley, Mark D. and Wang, Wenwu",
    title = "Audio Captioning Transformer",
    booktitle = "Proceedings of the 6th Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021)",
    address = "Barcelona, Spain",
    month = "November",
    year = "2021",
    pages = "211--215",
    isbn = "978-84-09-36072-7",
    doi. = "10.5281/zenodo.5770113"
}