UKPLab / sentence-transformers

Multilingual Sentence & Image Embeddings with BERT
https://www.SBERT.net
Apache License 2.0
14.73k stars 2.43k forks source link

error in LabelAccuracyEvaluator.py #27

Open HUSTHY opened 4 years ago

HUSTHY commented 4 years ago

The codes in line 53 in LabelAccuracyEvaluator.py : _, prediction = model(features[0]) It does not work. When I run this code,error occurs.

nreimers commented 4 years ago

Hi @HUSTHY thanks for pointing this out. The LabelAccuracyEvaluator needs access to the Softmax loss model in order to compute the labels (for example, for the NLI task).

See this file how the LabelAccuracyEvaluator must be changed (you get the file if you check-out the v0.2.4 branch): https://github.com/UKPLab/sentence-transformers/commit/638d3703bfe7353b2a9e04bacbef6b81d4a7618c

If you want to use the LabelAccuracyEvaluator, your code must look like this (for example, on the NLI dataset):

logging.info("Read AllNLI train dataset")
train_data = SentencesDataset(nli_reader.get_examples('train.gz', 1000), model=model)
train_dataloader = DataLoader(train_data, shuffle=True, batch_size=batch_size)
train_loss = losses.SoftmaxLoss(model=model, sentence_embedding_dimension=model.get_sentence_embedding_dimension(), num_labels=train_num_labels)

logging.info("Read STSbenchmark dev dataset")
dev_data = SentencesDataset(examples=nli_reader.get_examples('dev.gz'), model=model)
dev_dataloader = DataLoader(dev_data, shuffle=False, batch_size=batch_size)
evaluator = LabelAccuracyEvaluator(dev_dataloader, softmax_model = train_loss)

Best Nils Reimers

HUSTHY commented 4 years ago

I have thought introduce your SoftmaxLoss.py's function and use the nn.Linear() for classification. It was not clearly for  me about the parametes, so I did not fix it. Thanks for your reply!  

黄洋

 

------------------ 原始邮件 ------------------ 发件人: "Nils Reimers"<notifications@github.com>; 发送时间: 2019年9月21日(星期六) 晚上11:35 收件人: "UKPLab/sentence-transformers"<sentence-transformers@noreply.github.com>; 抄送: "黄洋"<840499869@qq.com>; "Mention"<mention@noreply.github.com>; 主题: Re: [UKPLab/sentence-transformers] error in LabelAccuracyEvaluator.py (#27)

Hi @HUSTHY thanks for pointing this out. The LabelAccuracyEvaluator needs access to the Softmax loss model in order to compute the labels (for example, for the NLI task).

See this file how the LabelAccuracyEvaluator must be changed (you get the file if you check-out the v0.2.4 branch): 638d370

If you want to use the LabelAccuracyEvaluator, your code must look like this (for example, on the NLI dataset): logging.info("Read AllNLI train dataset") train_data = SentencesDataset(nli_reader.get_examples('train.gz', 1000), model=model) train_dataloader = DataLoader(train_data, shuffle=True, batch_size=batch_size) train_loss = losses.SoftmaxLoss(model=model, sentence_embedding_dimension=model.get_sentence_embedding_dimension(), num_labels=train_num_labels) logging.info("Read STSbenchmark dev dataset") dev_data = SentencesDataset(examples=nli_reader.get_examples('dev.gz'), model=model) dev_dataloader = DataLoader(dev_data, shuffle=False, batch_size=batch_size) evaluator = LabelAccuracyEvaluator(dev_dataloader, softmax_model = train_loss)
Best Nils Reimers

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

nreimers commented 4 years ago

Hi @HUSTHY Note that this framework is not optimal for sentence pairwise classification. It uses a bi-encoder, i.e., sentence are mapped independently to sentence embeddings. For classification, the classifier would take these two embeddings and derive a label.

BERT, on the other side, uses a cross-encoder: Both sentences are present at input time and BERT can compare the two inputs to derive the labels. This gives much better classification results. The disadvantage of BERT cross encoder is, that you do not get sentence embeddings, which you need for example for clustering, semantic search etc.

If you do pairwise classification, like NLI, BERT would be the better choice.

HUSTHY commented 4 years ago

I got it. Thanks for your suggestion.

HUSTHY commented 4 years ago

I found a bug in LabelAccuracyEvaluator.py files in v0.2.4 branch. _, prediction = self.softmaxmodel(features, labels=None) It should fixed like this: , prediction = self.softmax_model.to(self.device)(features, labels=None)

And I have a question. When I use model.fit() and then use LabelAccuracyEvaluator, the accuracy is 0.915. However, when I and load the saved fine-tune model and just only use LabelAccuracyEvaluator to evaluate the same data , the accuracy is 0.51. Is there something wrong? My code is like this:

batch_size = 16 nli_reader = LCQMCDataReader('datasets/patentData') model_save_path='output/training_patent_sbert-Chinese-BERT-wwm2019-09-23_13-11-58_with_15K_Trains' word_embedding_model=models.BERT('output/training_patent_sbert-Chinese-BERT-wwm2019-09-23_14-50-35_with_15K_Trains/0_BERT')

Apply mean pooling to get one fixed sized sentence vector

pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension(), pooling_mode_mean_tokens=True, pooling_mode_cls_token=False, pooling_mode_max_tokens=False) model = SentenceTransformer(modules=[word_embedding_model, pooling_model]) train_loss= losses.SoftmaxLoss(model=model, sentence_embedding_dimension=model.get_sentence_embedding_dimension(), num_labels=2)

test_data = SentencesDataset(examples=nli_reader.get_examples("test.csv"), model=model) test_dataloader = DataLoader(test_data, shuffle=False, batch_size=batch_size)

evaluator=LabelAccuracyEvaluator(test_dataloader,model_save_path) model.evaluate(evaluator)

nreimers commented 4 years ago

Hi @HUSTHY the model saves only the layers that are responsible to produce sentence embeddings (which is the main purpose of this framework).

The SoftmaxLoss module is a softmax classifier with trainable weights. These weights are not stored on default. I.e., when you call train_loss = losses.SoftmaxLoss(...) a new softmax classifier is initialized with random weights. This new softmax classifier produces no sensible labels, therefore you get a low accuracy when you load the model.

Solution: You would need to save the train_loss also to disc and load it then from disc. You would need to use the standard pytorch load / save functions to store and load the SoftmaxLoss to / from disc.

Best regards Nils Reimers

HUSTHY commented 4 years ago

Thanks for your explain and solution!It is suddenly become extensive.

HUSTHY commented 4 years ago

Maybe you should correct the codes in LabelAccuracyEvaluator.py def init()

黄洋

 

------------------ 原始邮件 ------------------ 发件人: "Zarmeen"<notifications@github.com>; 发送时间: 2019年11月2日(星期六) 凌晨0:06 收件人: "UKPLab/sentence-transformers"<sentence-transformers@noreply.github.com>; 抄送: "黄洋"<840499869@qq.com>;"Mention"<mention@noreply.github.com>; 主题: Re: [UKPLab/sentence-transformers] error in LabelAccuracyEvaluator.py (#27)

Hi @HUSTHY and @nreimers I install the package from source. And made the suggested changes in the LabelAccuracyEvaluator code but still I am getting this error

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

HUSTHY commented 4 years ago

I review this code. Using model.fit() and model = SentenceTransformer(modules=[word_embedding_model, pooling_model]),it works, parameters in train_loss.classifier are updated. In SoftmaxLoss.py and Class SoftMaxLoss(nn.Module), classifier = nn.linear() is defined. When we use function model.fit(train_loss,...), in function fit() you will see how those params are trained. If you still do not understand, you had better get the explain from the authors.

 

------------------ 原始邮件 ------------------ 发件人: "Shuhuai Ren"<notifications@github.com>; 发送时间: 2019年12月11日(星期三) 下午3:14 收件人: "UKPLab/sentence-transformers"<sentence-transformers@noreply.github.com>; 抄送: "黄洋"<840499869@qq.com>;"Mention"<mention@noreply.github.com>; 主题: Re: [UKPLab/sentence-transformers] error in LabelAccuracyEvaluator.py (#27)

I found a bug in LabelAccuracyEvaluator.py files in v0.2.4 branch. _, prediction = self.softmaxmodel(features, labels=None) It should fixed like this: , prediction = self.softmax_model.to(self.device)(features, labels=None)

And I have a question. When I use model.fit() and then use LabelAccuracyEvaluator, the accuracy is 0.915. However, when I and load the saved fine-tune model and just only use LabelAccuracyEvaluator to evaluate the same data , the accuracy is 0.51. Is there something wrong? My code is like this:

batch_size = 16 nli_reader = LCQMCDataReader('datasets/patentData') model_save_path='output/training_patent_sbert-Chinese-BERT-wwm2019-09-23_13-11-58_with_15K_Trains' word_embedding_model=models.BERT('output/training_patent_sbert-Chinese-BERT-wwm2019-09-23_14-50-35_with_15K_Trains/0_BERT')

Apply mean pooling to get one fixed sized sentence vector

pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension(), pooling_mode_mean_tokens=True, pooling_mode_cls_token=False, pooling_mode_max_tokens=False) model = SentenceTransformer(modules=[word_embedding_model, pooling_model]) train_loss= losses.SoftmaxLoss(model=model, sentence_embedding_dimension=model.get_sentence_embedding_dimension(), num_labels=2)

test_data = SentencesDataset(examples=nli_reader.get_examples("test.csv"), model=model) test_dataloader = DataLoader(test_data, shuffle=False, batch_size=batch_size)

evaluator=LabelAccuracyEvaluator(test_dataloader,model_save_path) model.evaluate(evaluator)

@HUSTHY Hi~ I'm curious about how do you train this model? use model.fit() and model = SentenceTransformer(modules=[word_embedding_model, pooling_model])? If so, how can you update the parameters in train_loss.classifier? You know this is a nn.linear model with its own parameters, and I think these parameters can not be updated by model.fit... How can you get accuracy 0.915? Looking forward to your reply, thanks very much~

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

rsboaventura commented 3 years ago

Hi @nreimers,

I am having the following issue when trying to use LabelAccuracyEvaluator. I am not sure if it has something to do with the previous reported issues.

_Traceback (most recent call last): File "D:/PycharmProjects/BERT NLP Tests/AllNLI/SentenceTransformers Fine-tuning - PT_BR V0.py", line 208, in dev_evaluator = LabelAccuracyEvaluator(dev_dataloader, softmax_model=train_loss) File "D:\PycharmProjects\venv\lib\site-packages\sentence_transformers\evaluation\LabelAccuracyEvaluator.py", line 29, in init self.softmaxmodel.to(self.device) AttributeError: 'LabelAccuracyEvaluator' object has no attribute 'device'

Here's the part of the code that maybe related to the command with the issue:

**... train_dataset = SentencesDataset(train_samples, model=model) train_dataloader = DataLoader(train_dataset, shuffle=True, batch_size=train_batch_size) train_loss = losses.SoftmaxLoss(model=model, sentence_embedding_dimension=model.get_sentence_embedding_dimension(), num_labels=len(label2int)) ... for index, row in reader.iterrows(): label_id = label2int[row['entailment']] test_samples.append(InputExample(texts=[row['sentence1'], row['sentence2']], label=label_id))

model = SentenceTransformer(model_save_path)

test_evaluator = LabelAccuracyEvaluator.from_input_examples(test_samples, batch_size=train_batch_size, name='assin-nli-test')

dev_dataloader = DataLoader(dev_dataset, shuffle=False, batch_size=train_batch_size) dev_evaluator = LabelAccuracyEvaluator(dev_dataloader, softmax_model=train_loss) ..

Any ideas about its cause?

Thank you in advance.

nreimers commented 3 years ago

Try to use the most recent version of this evaluator: https://github.com/UKPLab/sentence-transformers/blob/master/sentence_transformers/evaluation/LabelAccuracyEvaluator.py

It is not yet part of a release.

rsboaventura commented 3 years ago

Perfect @nreimers, it solved the problem. Thank you so much!

rsboaventura commented 3 years ago

Sorry @nreimers, I've managed to get accuracy from train and dev datasets but I am having an issue to get it from the test dataset. I am trying to follow the pattern used on dev, but I may be missing something... Here's the error and the code:

_Traceback (most recent call last): File "D:/PycharmProjects/BERT NLP Tests/AllNLI/SentenceTransformers Fine-tuning - PT_BR V0.py", line 246, in model.evaluate(test_evaluator) File "D:\PycharmProjects\venv\lib\site-packages\sentence_transformers\SentenceTransformer.py", line 599, in evaluate return evaluator(self, output_path) File "D:\PycharmProjects\venv\lib\site-packages\sentencetransformers\evaluation\LabelAccuracyEvaluator.py", line 51, in call , prediction = self.softmaxmodel(features, labels=None) TypeError: 'NoneType' object is not callable

**... for index, row in reader.iterrows(): label_id = label2int[row['entailment']] test_samples.append(InputExample(texts=[row['sentence1'], row['sentence2']], label=label_id))

test_dataset = SentencesDataset(examples=test_samples, model=model) test_dataloader = DataLoader(test_dataset, shuffle=False, batch_size=train_batch_size)

test_evaluator = LabelAccuracyEvaluator(test_dataloader, model_save_path) model.evaluate(test_evaluator)** ...

Thank you in advance.

nreimers commented 3 years ago

The softmax model is not part of what is stored when the model is trained.

If you want to use it, you must store it by your self, load it and add it to the LabelAccuracyEvaluator when you run it for the test set.

rsboaventura commented 3 years ago

Thank you!

I was reading other posts related to the subject and I may not need to have LabelAccuracyEvaluator accuracy at test time.

In fact, I need to fine tune bert-base-multilingual-cased (and a brazilian portuguese version of it) to improve its sentence embeddings for a sentence textual similarity task (find similar short texts in a text vector). I already tested the vanilla embeddings but they aren't good enough for my purpose.

Therefore I am labeling 1.000 sentence pairs with either similar or dissimilar labels (0 or 1) and intend to fine tune the vanilla model (adjusted for sentence embeddings) with this data set in order to check it's accuracy and improve the Bert embeddings further.

Maybe the cosine similarity loss function and the BinaryClassificationEvaluator class can solve my problem if I split the data set in train, dev and test and run it pretty much as the Training Overview tutorial states. Does this strategy make sense?

Thank you in advance for your assistance.