cisnlp / simalign

Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)
MIT License
345 stars 47 forks source link

Function to get match matrices #44

Open sinking-point opened 10 months ago

sinking-point commented 10 months ago

get_word_aligns returns a mapping (list of tuples). Some applications call for different matching policies, e.g. a one-to-one or one-to-many mapping.

It would be useful to separate out the first part into a new function e.g. get_word_align_matrices, that is essentially this part of get_word_aligns. This would allow the user to implement their own matching algorithm on the matrix however they want.

It's a very easy change that would add a lot of value.

sinking-point commented 10 months ago

Correction: I think sim is the correct thing to return for this. The other matrices are just 1 for match, 0 for no match. I originally thought they would be probability-of-match matrices.