dominiksinsaarland / DocSCAN

Learning from Neighbors: Unsupervised Text Classification
17 stars 2 forks source link

src/compute_sbert_embedding. py #1

Open tiangxiaohu opened 1 year ago

tiangxiaohu commented 1 year ago

Hello, I am a graduate student. My research direction is similar to yours, and I am very interested in your paper. Your code is currently running. However, I now encounter a problem that urgently requires your help.

I cannot find the "src/compute_sbert_embedding. py" file mentioned in your code. Where should I go to look for this document? I sincerely hope to receive your help, and I will greatly appreciate it!

dominiksinsaarland commented 1 year ago

Hi,

If I remember correctly, I have rewrote the code last Fall such that this dependency is not strictly necessary anymore. I just double-checked the main py files and cannot find any import where this file is called.

In any case, the file was just a wrapper to call SBERT embeddings, along the lines of

def embedd_sentences(self, sentences):
    embedder = SentenceTransformer(self.args.sbert_model)
    embedder.max_seq_length = self.args.max_seq_length
    corpus_embeddings = embedder.encode(sentences, batch_size=32, show_progress_bar=True)
    return corpus_embeddings

self.args.sbert_model = "sentence-transformers/all-mpnet-base-v2", self.args.max_seq_length = 128 sentences = the sentences we want to encode.

Hope this helps, let me know if you should have further questions.

Best, Dominik