Closed afsara-ben closed 6 months ago
We use different audio encoder (HTS-AT) and the text encoder (RoBERTa), as well as support different lengths of audio tracks as input. The Microsoft CLAP and LAION CLAP are the same-period work both in 2022-2023. Both our CLAP and microsoft CLAP were accepted in ICASSP 2023.
I am failing to understand what the difference is between LAION-AI/[CLAP] and the work in "CLAP: Learning Audio Concepts From Natural Language Supervision " (https://arxiv.org/abs/2206.04769), it seems the same, can the original authors clarify?