label2int = {"contradiction": 0, "entailment": 1, "neutral": 2}
train_samples = []
dev_samples = []
with gzip.open(nli_dataset_path, 'rt', encoding='utf8') as fIn:
reader = csv.DictReader(fIn, delimiter='\t', quoting=csv.QUOTE_NONE)
for row in reader:
label_id = label2int[row['label']]
if row['split'] == 'train':
train_samples.append(InputExample(texts=[row['sentence1'], row['sentence2']], label=label_id))
else:
dev_samples.append(InputExample(texts=[row['sentence1'], row['sentence2']], label=label_id))
However, I have a dataset of format (text, label). I want to sample two rows from the dataset (text1, label1), (text2, label2) and generate a training sample like (text1, text2, 1(label1<label2). I want to do this online during training. Is there any way make this work using sentence-transformer cross encoder?
Thanks
Hi, Awesome work. I read the training tutorial for cross-encoder (https://www.sbert.net/examples/training/cross-encoder/README.html). It creates the dataset from a static file of format
(text1, text2, label)
:However, I have a dataset of format
(text, label)
. I want to sample two rows from the dataset(text1, label1), (text2, label2)
and generate a training sample like(text1, text2, 1(label1<label2)
. I want to do this online during training. Is there any way make this work using sentence-transformer cross encoder? Thanks