Error in examples/align_files.py when token-type is word

Line 172 - 200 of the file examples/align_files.py is shown below. The return type of embed_loader.get_embed_list(...) is tensors whereas SentenceAligner.get_similarity requires numpy arrays.

                vectors = embed_loader.get_embed_list(list(sent_pair))
        if convert_to_words:
            w2b_map = []
            cnt = 0
            w2b_map.append([])
            for wlist in l1_tokens:
                w2b_map[0].append([])
                for x in wlist:
                    w2b_map[0][-1].append(cnt)
                    cnt += 1
            cnt = 0
            w2b_map.append([])
            for wlist in l2_tokens:
                w2b_map[1].append([])
                for x in wlist:
                    w2b_map[1][-1].append(cnt)
                    cnt += 1
            new_vectors = []
            for l_id in range(2):
                w_vector = []
                for word_set in w2b_map[l_id]:
                    w_vector.append(vectors[l_id][word_set].mean(0))
                new_vectors.append(np.array(w_vector))
            vectors = np.array(new_vectors)

        all_mats = {}
        sim = SentenceAligner.get_similarity(vectors[0], vectors[1])
        sim = SentenceAligner.apply_distortion(sim, args.distortion)

This is problematic when --token-type = word since sklearn.metrics.pairwise.cosine_similarity isn't able to convert tensors to numpy array directly (because they also have gradients).

This is the exact error

  File "/home/ishan/simalign/simalign/simalign.py", line 110, in get_similarity
    return (cosine_similarity(X, Y) + 1.0) / 2.0
  File "/home/ishan/.local/lib/python3.6/site-packages/sklearn/metrics/pairwise.py", line 1179, in cosine_similarity
    X, Y = check_pairwise_arrays(X, Y)
  File "/home/ishan/.local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 72, in inner_f
    return f(**kwargs)
  File "/home/ishan/.local/lib/python3.6/site-packages/sklearn/metrics/pairwise.py", line 134, in check_pairwise_arrays
    X, Y, dtype_float = _return_float_dtype(X, Y)
  File "/home/ishan/.local/lib/python3.6/site-packages/sklearn/metrics/pairwise.py", line 45, in _return_float_dtype
    X = np.asarray(X)
  File "/home/ishan/.local/lib/python3.6/site-packages/numpy/core/_asarray.py", line 83, in asarray
    return array(a, dtype, copy=False, order=order)
  File "/home/ishan/.local/lib/python3.6/site-packages/torch/tensor.py", line 492, in __array__
    return self.numpy()
RuntimeError: Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead.

Quick workaround is to add the line vectors = np.array(vectors.detach()) by adding the else clause to if convert_to_words

cisnlp / simalign

Error in examples/align_files.py when token-type is word #15