YuanGongND / ssast

Code for the AAAI 2022 paper "SSAST: Self-Supervised Audio Spectrogram Transformer".
BSD 3-Clause "New" or "Revised" License
362 stars 58 forks source link

SSAST for embedding model #34

Open Ironmomo opened 5 months ago

Ironmomo commented 5 months ago

Hi Yuan, We're reaching out again regarding our Bachelor Thesis on Speaker Recognition. We're facing a challenge in implementing the SSAST as an embedding model trained on Contrastive Loss, such as Triplet Loss. Since speaker recognition poses an open set problem, where the number of speaker classes isn't predetermined, we need to determine a suitable dimension for the embedding. Additionally, we consider to make adjustments to the multilayer perceptron (MLP) head to accommodate this. During your studies on SSAST do u came up with any insights that could maybe lead to any recommendations for us? Thanks, Andrin