ankitapasad / layerwise-analysis

Layer-wise analysis of self-supervised pre-trained speech representations
92 stars 15 forks source link

About the selection of glove #7

Open futian00 opened 1 month ago

futian00 commented 1 month ago

Hello, I would like to ask why the choice of glove embeddings is Common Crawl and the choice of agwe embeddings is librispeech in the code. Shouldn't the choice of glove embeddings also be librispeech

ankitapasad commented 4 weeks ago

Hi @futian00

When we compare representations from self-supervised speech models (S3Ms) to any external embedding space (AGWE or GloVE), we are mainly checking for similarities between the S3M's word embedding space and the specific property offered by the external embedding space (pronunciation for AGWEs and semantic for GloVe). So, we don't necessarily have to match the pre-training data.

It is nice to have a matched domain for AGWEs, as these are jointly trained with acoustic word embeddings on speech data, and speech data has much more variability than text. So, the domain of the pre-training data could impact the learned embeddings. However, it remains to be seen whether using AGWEs trained on a different dataset still offers insights similar to those trained on LibriSpeech.

Related to your question: In a different experiment, we compare S3Ms representation with syntactic properties extracted from a text corpus. Our findings remain consistent irrespective of the corpus used to extract the syntactic properties.

I hope that offers some clarity!