How can I get the confidence of a specific alignment?

cisnlp / simalign

Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)

MIT License

347 stars 47 forks source link

How can I get the confidence of a specific alignment? #9

Closed jiangweiatgithub closed 1 year ago

jiangweiatgithub commented 4 years ago

I would need this feature in order to find out about possible mis-aligned words? Thanks!

pdufter commented 4 years ago

We have not investigated a confidence for alignments yet. I guess a straight-forward approach would be to consider the (average) similarity of the aligned words. I do not have time to investigate this in the next week, but feel free to create pull requests.

dmar1n commented 4 years ago

If it is of any help to you, @jiangweiatgithub, there is already a pull request that does this (#4).

creolio commented 3 years ago

We have not investigated a confidence for alignments yet. I guess a straight-forward approach would be to consider the (average) similarity of the aligned words. I do not have time to investigate this in the next week, but feel free to create pull requests.

Do you have any recommendations or tool suggestions on how to calculate the similarity of one specific word in one language to a specific word in a different language?

jiangweiatgithub commented 3 years ago

Here is an interesting article, which provided complete python code: https://www.tensorflow.org/hub/tutorials/cross_lingual_similarity_with_tf_hub_multilingual_universal_encoder

pdufter commented 3 years ago

@creolio not sure whether I understand your question correctly. SimAlign creates alignments based on these similarities. Maybe feeding just two words, e.g., "cat" and "Katze" to SimAlign and looking at the output of the get_similarity method might suit your purpose?

creolio commented 2 years ago

@creolio not sure whether I understand your question correctly. SimAlign creates alignments based on these similarities. Maybe feeding just two words, e.g., "cat" and "Katze" to SimAlign and looking at the output of the get_similarity method might suit your purpose?

Thank you. I had to put this aspect of the project on pause, but when I get back to it, I'll try this out. Sounds about right, tho.