Living-with-machines / DeezyMatch

A Flexible Deep Learning Approach to Fuzzy String Matching
https://living-with-machines.github.io/DeezyMatch/
Other
139 stars 34 forks source link

Visualization of candidate finder output #13

Open kasra-hosseini opened 4 years ago

kasra-hosseini commented 4 years ago

Use a dimensionality reduction code, e.g., t-SNE, to visualize the outputs of candidate finder.

kasra-hosseini commented 3 years ago

One example:

import torch
import umap
import numpy as np

fwd_cand = torch.load("./combined/candidates_test/fwd.pt")
bwd_cand = torch.load("./combined/candidates_test/bwd.pt")
items_all_cand = np.load("./combined/candidates_test/fwd_items.npy", allow_pickle=True)

fwd_q = torch.load("./combined/queries_test/fwd.pt")
bwd_q = torch.load("./combined/queries_test/bwd.pt")
items_all_q = np.load("./combined/queries_test/fwd_items.npy", allow_pickle=True)

vecs_cand = torch.cat([fwd_cand, bwd_cand], dim=1)
vecs_q = torch.cat([fwd_q, bwd_q], dim=1)

num_samples_cand = 200
num_samples_q = 10

vecs_all = torch.cat([vecs_cand[:num_samples_cand], vecs_q[:num_samples_q]])
items_all = np.concatenate([items_all_cand[:num_samples_cand, 1].T, items_all_q[:num_samples_q, 1].T])

embedding = umap.UMAP(n_components=2, n_neighbors=10, metric="cosine").fit(vecs_all)

colors = len(items_all_cand[:num_samples_cand, 1].T)*["blue"] + len(items_all_q[:num_samples_q, 1].T)*["red"]
import plotly.graph_objects as go

fig = go.Figure(data=go.Scatter(
    x=embedding.embedding_[:, 0],
    y=embedding.embedding_[:, 1],
    mode='markers',
    text=items_all,
    marker=dict(color=colors)
))

# import plotly.express as px
# df = px.data.tips()

# fig = px.scatter(x=embedding.embedding_[:, 0], 
#                  y=embedding.embedding_[:, 1],
#                  #hover_data={"text": items}
#                  text=items_q,

#                 )
# fig.update_traces(textposition='top center')
fig.show()