Training custom data using triplet loss

Hi, your train / dev file should look like this: anchor1 positive1 negative1 anchor2 positive2 negative2

Separated by tabs (\t).

The difficult task is how to choose the positive / negative examples. You find a lot of literature on this, and that selecting the negative example can be of high importance for the performance.

This paper can be of interest: https://arxiv.org/pdf/1703.07737.pdf

The described batch hard strategy is also implemented in this framework.

How to choose the positive / negative example depends on your task, so there is no general rule for that. Triplet loss tries to bring anchor and positive close together, while maximizing the distance between anchor and negative.

Best Nils Reimers

UKPLab / sentence-transformers

Training custom data using triplet loss #97