facebookresearch / SentEval

A python tool for evaluating the quality of sentence embeddings.
Other
2.09k stars 309 forks source link

Problem with Padding in BLSTMEncoder and equal length inputs #35

Closed rasoolfa closed 6 years ago

rasoolfa commented 6 years ago

Hi,

Since it is possible that equal length sentences appear as an input in [1] and np.sort/np.argsort use quicksort by default which is an unstable sorting algorithm, shouldn't quicksort function calls in #L45 and #L46 be replaced by mergesort so the input and output have same order for equal keys?

Changed to follwoings:

sent_len, idx_sort = np.sort(sent_len, kind='mergesort')[::-1], np.argsort(-sent_len, kind='mergesort')
idx_unsort = np.argsort(idx_sort, kind='mergesort')

[1] https://github.com/facebookresearch/SentEval/blob/master/examples/models.py#L44

aconneau commented 6 years ago

Hi, I think in that particular case it shouldn't be a problem and input and output should have same order for equal keys. Could you provide an example where using this method leads to a bad re-sorting? Thanks, Alexis

aconneau commented 6 years ago

Closing this issue, please feel free to re-open anytime to follow up on this!

Tsegaye-misikir commented 6 years ago

I was trying to access the file but the directory is not the path provided. https://github.com/facebookresearch/SentEval/blob/master/examples/models.py

where should I get it?