BinWang28 / SBERT-WK-Sentence-Embedding

IEEE/ACM TASLP 2020: SBERT-WK: A Sentence Embedding Method By Dissecting BERT-based Word Models
Apache License 2.0
177 stars 27 forks source link

How can i get the sentence representations from SBERT-WK #13

Open boscoj2008 opened 3 years ago

boscoj2008 commented 3 years ago

I would like to use the representations for a custom data in downstream clustering. But i don't see how i can obtain the sentence representations using your method. Any help will be appreciated. Thanks in advance..

BinWang28 commented 3 years ago

Hi John, You can simply run ./example.sh to see the example for extracting sentence representation for an input sentence. It should be easily edited for your specific task.

boscoj2008 commented 3 years ago

@BinWang28 thank you for responding. I have already tried the example.sh file and run it before. The default setting is to ask for 2 sentences and return a similarity score. However, I have 5000 records or tuples and would not enter this manually for each sentence/ record. Moreover, what I am looking for are the sentence representations themselves, i.e., sentence vectors of my records and not the similarity score. Could you elaborate a process in which the code could be modified? Thanks.

BinWang28 commented 3 years ago

Hi John, That's a good starting point if you can make the example code working. You do not need to to manually input all the sentences. What you need to do is (in python scripts):

  1. Read all your 5000 records/sentences
  2. Write a for loop to extract each of the record/sentence embeddings one by one. (simply modification on the example code)
  3. Use the sentence embedding in the your applications