Open futian00 opened 1 month ago
Hi @futian00
When we compare representations from self-supervised speech models (S3Ms) to any external embedding space (AGWE or GloVE), we are mainly checking for similarities between the S3M's word embedding space and the specific property offered by the external embedding space (pronunciation for AGWEs and semantic for GloVe). So, we don't necessarily have to match the pre-training data.
It is nice to have a matched domain for AGWEs, as these are jointly trained with acoustic word embeddings on speech data, and speech data has much more variability than text. So, the domain of the pre-training data could impact the learned embeddings. However, it remains to be seen whether using AGWEs trained on a different dataset still offers insights similar to those trained on LibriSpeech.
Related to your question: In a different experiment, we compare S3Ms representation with syntactic properties extracted from a text corpus. Our findings remain consistent irrespective of the corpus used to extract the syntactic properties.
I hope that offers some clarity!
Hello, I would like to ask why the choice of glove embeddings is Common Crawl and the choice of agwe embeddings is librispeech in the code. Shouldn't the choice of glove embeddings also be librispeech