JohnSnowLabs / nlu

1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.
Apache License 2.0
854 stars 130 forks source link

using NLU-biobert for entity linking or word embedding #102

Closed gsharma14 closed 2 years ago

gsharma14 commented 2 years ago

I just wanted to enquire how can one use this model for entity linking? I believe I did see some linking and pos-tagging but is there some documentation that shows matching words to it's meaning rather than just matching with similarity? I want to load a spark database and use the model to perform word embedding by meaning on the whole dataset and store the output in another data frame, also being able to measure its performance by various metrics.

C-K-Loan commented 2 years ago

Hi @gsharma14

you can calculate the embeddings for each sentence in your dataset. Then you can calculate the pairwise similarity between each datapoint which will give you a similarity matrix between each data point in your dataset which you can use for various applications.

See this medium Tutorial and the corresponding notebook https://medium.com/spark-nlp/easy-sentence-similarity-with-bert-sentence-embeddings-using-john-snow-labs-nlu-ea078deb6ebf https://github.com/JohnSnowLabs/nlu/blob/master/examples/colab/component_examples/sentence_embeddings/sentence_similarirty_stack_overflow_questions.ipynb

gsharma14 commented 2 years ago

Thank you so much !

On Sun, Feb 27, 2022 at 9:20 AM Christian Kasim Loan < @.***> wrote:

Hi @gsharma14 https://github.com/gsharma14

you can calculate the embeddings for each sentence in your dataset. Then you can calculate the pairwise similarity between each datapoint which will give you a similarity matrix between each data point in your dataset which you can use for various applications.

See this medium Tutorial and the corresponding notebook

https://medium.com/spark-nlp/easy-sentence-similarity-with-bert-sentence-embeddings-using-john-snow-labs-nlu-ea078deb6ebf

https://github.com/JohnSnowLabs/nlu/blob/master/examples/colab/component_examples/sentence_embeddings/sentence_similarirty_stack_overflow_questions.ipynb

— Reply to this email directly, view it on GitHub https://github.com/JohnSnowLabs/nlu/issues/102#issuecomment-1053570006, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANBI7KZJ2VALUCLS2KTXA5TU5IXL5ANCNFSM5PFNOLOA . You are receiving this because you were mentioned.Message ID: @.***>

-- Best, Gopalika Sharma Data Scientist Corbus Pharmaceuticals