anoopkunchukuttan / indic_nlp_library

Resources and tools for Indian language Natural Language Processing
http://anoopkunchukuttan.github.io/indic_nlp_library/
MIT License
546 stars 158 forks source link

Computing similarity between languages #25

Closed VP007-py closed 4 years ago

VP007-py commented 4 years ago

Is there a documentation support to find similarity between two languages ? If so,can you include an example here

anoopkunchukuttan commented 4 years ago

I just added support for computing similarity between two string across languages. Please pull latest version of the library. An example of computing the similarity between two strings is shown in the Ipython Notebook now. To compute the similarity between two languages, you would need a parallel corpus. Then you can compute the lexical similarity between every sentence pair and average them to compute the lexical similarity.

VP007-py commented 4 years ago

Thanks for adding it @anoopkunchukuttan

Exactly the one I was looking for !

chiragsanghvi10 commented 4 years ago

Hi @anoopkunchukuttan, Thank you for this repository,

Is it possible to find out lexical similarity between Hindi and English?

anoopkunchukuttan commented 4 years ago

It is not possible with this library - the scripts, word order are different. Quite a challenge given how these two languages have diverged over time.