ValueError: The truth value of an array with more than one element is ambiguous. When training CrossEncoder

I'm (more or less) following the Training_quora_duplicate_questions.py example using my own data.

As I understood from the cross encoder page, and I quote:

For binary tasks and tasks with continuous scores (like STS), we set num_labels=1. For classification tasks, we set it to the number of labels we have.

As such I've written the model thusly:

model = CrossEncoder("distilroberta-base", num_labels= 2)

and the evaluator thusly:

evaluator = CEBinaryClassificationEvaluator.from_input_examples(dev_examples, name= "Rooms-dev")

Yet, when I run the code as soon as it tries to evaluate for the first time it gives the following error:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I tried the same code before but with num_labels=1 and the it worked, 'though isn't (I think) quite what I'd like since I want to model to predict either 0 or 1 and not a continuous from 0 to 1.

Any ideia what might be causing this?

Code and error message below:

# extracting rooms & labels into lists
rooms_1 = rooms_df["room_ner1"].tolist()
rooms_2 = rooms_df["room_ner2"].tolist()
labels = rooms_df["label"].tolist()

# creating array of room-pairs
dataset = [
    [original, candidate, label]
    for original, candidate, label
    in zip(rooms_1, rooms_2, labels)
]

random.shuffle(dataset)

dev_sample_size = int(len(dataset) * 0.10)

dev_data = dataset[: dev_sample_size]
train_data = dataset[dev_sample_size: ]

# preparing training dataset
train_examples = list()
n_examples = len(train_data)

# creating training dataset
for i in trange(n_examples):
    example = train_data[i]
    train_examples.append(InputExample(texts= [example[0], example[1]], label= example[2]))
    train_examples.append(InputExample(texts= [example[1], example[0]], label= example[2]))

# preparing test dataset
dev_examples = list()
d_examples = len(dev_data)

# creating test dataset
for i in trange(d_examples):
    example = dev_data[i]
    dev_examples.append(InputExample(texts= [example[0], example[1]], label= example[2]))

train_batch_size = 16
num_epochs = 4
model_save_path = "output/training_rooms-" + datetime.now().strftime("%Y-%m-%d_%H-%M-%S")

model = CrossEncoder("distilroberta-base", num_labels= 2)#, device= "mps")

train_dataloader = DataLoader(
    train_examples, 
    shuffle= True,
    batch_size= train_batch_size
)

evaluator = CEBinaryClassificationEvaluator.from_input_examples(dev_examples, name= "Rooms-dev") 

warmup_steps = math.ceil(len(train_dataloader) * num_epochs * 0.1) # 10% train data for warm-up
logger.info(f"Warmup-steps: {warmup_steps:_}")

model.fit(
    train_dataloader= train_dataloader,
    evaluator= evaluator,
    epochs= num_epochs,
    evaluation_steps= 5_000,
    warmup_steps= warmup_steps,
    output_path= model_save_path
)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[67], [line 1](vscode-notebook-cell:?execution_count=67&line=1)
----> [1](vscode-notebook-cell:?execution_count=67&line=1) model.fit(
      [2](vscode-notebook-cell:?execution_count=67&line=2)     train_dataloader= train_dataloader,
      [3](vscode-notebook-cell:?execution_count=67&line=3)     evaluator= evaluator,
      [4](vscode-notebook-cell:?execution_count=67&line=4)     epochs= num_epochs,
      [5](vscode-notebook-cell:?execution_count=67&line=5)     evaluation_steps= 5_000,
      [6](vscode-notebook-cell:?execution_count=67&line=6)     warmup_steps= warmup_steps,
      [7](vscode-notebook-cell:?execution_count=67&line=7)     output_path= model_save_path
      [8](vscode-notebook-cell:?execution_count=67&line=8) )

File [~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py:275](https://file+.vscode-resource.vscode-cdn.net/Users/duarteharris/IronHack/GitHub/room-matching/~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py:275), in CrossEncoder.fit(self, train_dataloader, evaluator, epochs, loss_fct, activation_fct, scheduler, warmup_steps, optimizer_class, optimizer_params, weight_decay, evaluation_steps, output_path, save_best_model, max_grad_norm, use_amp, callback, show_progress_bar)
    [272](https://file+.vscode-resource.vscode-cdn.net/Users/duarteharris/IronHack/GitHub/room-matching/~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py:272) training_steps += 1
    [274](https://file+.vscode-resource.vscode-cdn.net/Users/duarteharris/IronHack/GitHub/room-matching/~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py:274) if evaluator is not None and evaluation_steps > 0 and training_steps % evaluation_steps == 0:
--> [275](https://file+.vscode-resource.vscode-cdn.net/Users/duarteharris/IronHack/GitHub/room-matching/~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py:275)     self._eval_during_training(
    [276](https://file+.vscode-resource.vscode-cdn.net/Users/duarteharris/IronHack/GitHub/room-matching/~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py:276)         evaluator, output_path, save_best_model, epoch, training_steps, callback
    [277](https://file+.vscode-resource.vscode-cdn.net/Users/duarteharris/IronHack/GitHub/room-matching/~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py:277)     )
    [279](https://file+.vscode-resource.vscode-cdn.net/Users/duarteharris/IronHack/GitHub/room-matching/~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py:279)     self.model.zero_grad()
    [280](https://file+.vscode-resource.vscode-cdn.net/Users/duarteharris/IronHack/GitHub/room-matching/~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py:280)     self.model.train()

File [~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py:450](https://file+.vscode-resource.vscode-cdn.net/Users/duarteharris/IronHack/GitHub/room-matching/~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py:450), in CrossEncoder._eval_during_training(self, evaluator, output_path, save_best_model, epoch, steps, callback)
    [448](https://file+.vscode-resource.vscode-cdn.net/Users/duarteharris/IronHack/GitHub/room-matching/~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py:448) """Runs evaluation during the training"""
    [449](https://file+.vscode-resource.vscode-cdn.net/Users/duarteharris/IronHack/GitHub/room-matching/~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py:449) if evaluator is not None:
--> [450](https://file+.vscode-resource.vscode-cdn.net/Users/duarteharris/IronHack/GitHub/room-matching/~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py:450)     score = evaluator(self, output_path=output_path, epoch=epoch, steps=steps)
    [451](https://file+.vscode-resource.vscode-cdn.net/Users/duarteharris/IronHack/GitHub/room-matching/~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py:451)     if callback is not None:
    [452](https://file+.vscode-resource.vscode-cdn.net/Users/duarteharris/IronHack/GitHub/room-matching/~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py:452)         callback(score, epoch, steps)

File [~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/cross_encoder/evaluation/CEBinaryClassificationEvaluator.py:81](https://file+.vscode-resource.vscode-cdn.net/Users/duarteharris/IronHack/GitHub/room-matching/~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/cross_encoder/evaluation/CEBinaryClassificationEvaluator.py:81), in CEBinaryClassificationEvaluator.__call__(self, model, output_path, epoch, steps)
     [76](https://file+.vscode-resource.vscode-cdn.net/Users/duarteharris/IronHack/GitHub/room-matching/~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/cross_encoder/evaluation/CEBinaryClassificationEvaluator.py:76) logger.info("CEBinaryClassificationEvaluator: Evaluating the model on " + self.name + " dataset" + out_txt)
     [77](https://file+.vscode-resource.vscode-cdn.net/Users/duarteharris/IronHack/GitHub/room-matching/~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/cross_encoder/evaluation/CEBinaryClassificationEvaluator.py:77) pred_scores = model.predict(
     [78](https://file+.vscode-resource.vscode-cdn.net/Users/duarteharris/IronHack/GitHub/room-matching/~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/cross_encoder/evaluation/CEBinaryClassificationEvaluator.py:78)     self.sentence_pairs, convert_to_numpy=True, show_progress_bar=self.show_progress_bar
     [79](https://file+.vscode-resource.vscode-cdn.net/Users/duarteharris/IronHack/GitHub/room-matching/~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/cross_encoder/evaluation/CEBinaryClassificationEvaluator.py:79) )
---> [81](https://file+.vscode-resource.vscode-cdn.net/Users/duarteharris/IronHack/GitHub/room-matching/~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/cross_encoder/evaluation/CEBinaryClassificationEvaluator.py:81) acc, acc_threshold = BinaryClassificationEvaluator.find_best_acc_and_threshold(pred_scores, self.labels, True)
     [82](https://file+.vscode-resource.vscode-cdn.net/Users/duarteharris/IronHack/GitHub/room-matching/~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/cross_encoder/evaluation/CEBinaryClassificationEvaluator.py:82) f1, precision, recall, f1_threshold = BinaryClassificationEvaluator.find_best_f1_and_threshold(
     [83](https://file+.vscode-resource.vscode-cdn.net/Users/duarteharris/IronHack/GitHub/room-matching/~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/cross_encoder/evaluation/CEBinaryClassificationEvaluator.py:83)     pred_scores, self.labels, True
     [84](https://file+.vscode-resource.vscode-cdn.net/Users/duarteharris/IronHack/GitHub/room-matching/~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/cross_encoder/evaluation/CEBinaryClassificationEvaluator.py:84) )
     [85](https://file+.vscode-resource.vscode-cdn.net/Users/duarteharris/IronHack/GitHub/room-matching/~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/cross_encoder/evaluation/CEBinaryClassificationEvaluator.py:85) ap = average_precision_score(self.labels, pred_scores)

File [~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/evaluation/BinaryClassificationEvaluator.py:226](https://file+.vscode-resource.vscode-cdn.net/Users/duarteharris/IronHack/GitHub/room-matching/~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/evaluation/BinaryClassificationEvaluator.py:226), in BinaryClassificationEvaluator.find_best_acc_and_threshold(scores, labels, high_score_more_similar)
    [223](https://file+.vscode-resource.vscode-cdn.net/Users/duarteharris/IronHack/GitHub/room-matching/~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/evaluation/BinaryClassificationEvaluator.py:223) assert len(scores) == len(labels)
    [224](https://file+.vscode-resource.vscode-cdn.net/Users/duarteharris/IronHack/GitHub/room-matching/~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/evaluation/BinaryClassificationEvaluator.py:224) rows = list(zip(scores, labels))
--> [226](https://file+.vscode-resource.vscode-cdn.net/Users/duarteharris/IronHack/GitHub/room-matching/~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/evaluation/BinaryClassificationEvaluator.py:226) rows = sorted(rows, key=lambda x: x[0], reverse=high_score_more_similar)
    [228](https://file+.vscode-resource.vscode-cdn.net/Users/duarteharris/IronHack/GitHub/room-matching/~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/evaluation/BinaryClassificationEvaluator.py:228) max_acc = 0
    [229](https://file+.vscode-resource.vscode-cdn.net/Users/duarteharris/IronHack/GitHub/room-matching/~/IronHack/GitHub/room-matching/.venv/lib/python3.11/site-packages/sentence_transformers/evaluation/BinaryClassificationEvaluator.py:229) best_threshold = -1

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Thank you in advance for any help.

UKPLab / sentence-transformers

ValueError: The truth value of an array with more than one element is ambiguous. When training CrossEncoder #2625