caizexin / tf_multispeakerTTS_fc

the Tensorflow version of multi-speaker TTS training with feedback constraint
MIT License
40 stars 31 forks source link

Question about the speaker embedding visualization by t-SNE? #6

Open sanena opened 2 years ago

sanena commented 2 years ago

Hi, could you tell me how to achieve the speaker embedding visualization by t-SNE?

image

caizexin commented 2 years ago

Sure, here is the sample code

from sklearn.manifold import TSNE
# the variable spk is the list of corresponding spk label for the embeddings, for example: ['p363-syn', 'p363-ori', 'p363-syn', ...]
# with respect to the order in the variable embeds
# create a spk2int mapping dictionary 
mapping = dict([(v,str(i)) for i,v in enumerate(list(set(spk)))])

# the variable embeds below is the list of embeddings from the synthesis utterances, while org_embeds are from genuine utterances
tsne_embed_mix = TSNE(n_components=2).fit_transform(np.concatenate([embeds, org_embeds]))

fig = plt.figure(figsize = (8,8))
ax = fig.add_subplot(1,1,1) 

for i in mapping.keys():
    indexes = np.where(np.array(spk) == i)[0]
    # since we just concatenate the embeds and the org_embeds and those two variables share the same order of spk labels
    # the org_indexes for spk i can be obtained as follow
    org_indexes = indexes + len(spk)
    plt.xlim(np.min(tsne_embed_mix[:, 0]) - 25, np.max(tsne_embed_mix[:, 0]))
    ax.scatter(tsne_embed_mix[indexes, 0], tsne_embed_mix[indexes, 1], c='C' + mapping[i], s=15, label = i, marker='x')
    ax.scatter(tsne_embed_mix[org_indexes, 0], tsne_embed_mix[org_indexes, 1], c='C' + mapping[i], s=15, label = i, marker='^')
ax.legend()
ax.grid()
#plt.show()