Line 172 - 200 of the file examples/align_files.py is shown below. The return type of embed_loader.get_embed_list(...) is tensors whereas SentenceAligner.get_similarity requires numpy arrays.
vectors = embed_loader.get_embed_list(list(sent_pair))
if convert_to_words:
w2b_map = []
cnt = 0
w2b_map.append([])
for wlist in l1_tokens:
w2b_map[0].append([])
for x in wlist:
w2b_map[0][-1].append(cnt)
cnt += 1
cnt = 0
w2b_map.append([])
for wlist in l2_tokens:
w2b_map[1].append([])
for x in wlist:
w2b_map[1][-1].append(cnt)
cnt += 1
new_vectors = []
for l_id in range(2):
w_vector = []
for word_set in w2b_map[l_id]:
w_vector.append(vectors[l_id][word_set].mean(0))
new_vectors.append(np.array(w_vector))
vectors = np.array(new_vectors)
all_mats = {}
sim = SentenceAligner.get_similarity(vectors[0], vectors[1])
sim = SentenceAligner.apply_distortion(sim, args.distortion)
This is problematic when --token-type = word since sklearn.metrics.pairwise.cosine_similarity isn't able to convert tensors to numpy array directly (because they also have gradients).
This is the exact error
File "/home/ishan/simalign/simalign/simalign.py", line 110, in get_similarity
return (cosine_similarity(X, Y) + 1.0) / 2.0
File "/home/ishan/.local/lib/python3.6/site-packages/sklearn/metrics/pairwise.py", line 1179, in cosine_similarity
X, Y = check_pairwise_arrays(X, Y)
File "/home/ishan/.local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 72, in inner_f
return f(**kwargs)
File "/home/ishan/.local/lib/python3.6/site-packages/sklearn/metrics/pairwise.py", line 134, in check_pairwise_arrays
X, Y, dtype_float = _return_float_dtype(X, Y)
File "/home/ishan/.local/lib/python3.6/site-packages/sklearn/metrics/pairwise.py", line 45, in _return_float_dtype
X = np.asarray(X)
File "/home/ishan/.local/lib/python3.6/site-packages/numpy/core/_asarray.py", line 83, in asarray
return array(a, dtype, copy=False, order=order)
File "/home/ishan/.local/lib/python3.6/site-packages/torch/tensor.py", line 492, in __array__
return self.numpy()
RuntimeError: Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead.
Quick workaround is to add the line vectors = np.array(vectors.detach()) by adding the else clause to if convert_to_words
Line 172 - 200 of the file examples/align_files.py is shown below. The return type of
embed_loader.get_embed_list(...)
is tensors whereas SentenceAligner.get_similarity requires numpy arrays.This is problematic when
--token-type
= word since sklearn.metrics.pairwise.cosine_similarity isn't able to convert tensors to numpy array directly (because they also have gradients).This is the exact error
Quick workaround is to add the line
vectors = np.array(vectors.detach())
by adding theelse
clause toif convert_to_words