centre-for-humanities-computing / dfm-sentence-transformers

Code for curating data and training sentence transformers for the Danish Foundation Models project.
MIT License
0 stars 0 forks source link

How to pretrain on NLI #2

Closed KennethEnevoldsen closed 10 months ago

x-tabdeveloping commented 10 months ago

Here is the essence of an example script in the sentence-transformers repository.

# NLI task definition
train_dataloader_nli = DataLoader(...)
train_loss_nli = losses.SoftmaxLoss(...)

# STS task definition
train_dataloader_sts = DataLoader(...)
train_loss_sts = losses.CosineSimilarityLoss(...)

# Here we define the two train objectives: train_dataloader_nli with train_loss_nli (i.e., SoftmaxLoss for NLI data)
# and train_dataloader_sts with train_loss_sts (i.e., CosineSimilarityLoss for STSbenchmark data)
# You can pass as many (dataloader, loss) tuples as you like. They are iterated in a round-robin way.
train_objectives = [(train_dataloader_nli, train_loss_nli), (train_dataloader_sts, train_loss_sts)]

model.fit(train_objectives=train_objectives, ... )

They train on multiple different tasks in succession so this shouldn't be an issue.

One other approach that would be really cool, but we would need to implement it, is to simply take each batch from a different task. So you do an update on one batch and then you get another batch from another task and you update the model on that as well. (on the other hand a miriad of problems may arise so it might be a stupid idea)

KennethEnevoldsen commented 10 months ago

Okay if we get some NLI data we can do it like this (unless we get a huge imbalance in dataset sizes)