Closed x-tabdeveloping closed 11 months ago
You've probably seen it but I added two more tasks for STS and NLI, both of them can be used in the config system from now on, so if we intend to add data for these tasks, we just add one more task in the config. :hugs:
I removed my negative sampling task. As per #3 MultipleNegativeRanking loss is going to be the default. Tasks can be easily added to the registry in the future if we need it.
Multiple tasks are now using
sentence_transformers
' default training regime. Tasks, that have the same loss are merged, so that training examples are sampled from a mixture of data sets. This is achieved by grouping tasks by their string representation, which should be the same if the loss function is the same. For example a snippet from the implementation of MultipleNegativesRanking:Tasks are then grouped as such:
We can't just use
@dataclass
because tasks also take the dataset and dataset related arguments.Datasets for tasks can be loaded with :hugs: Datasets'
load_dataset()
function when describing tasks in the configuration file. As such both local and remote datasets can be loaded. #4Example:
Unfortunately due to validation errors, I couldn't just put the original function into the registry, here is the snippet that does it:
This might have to change in the future if we intend to use other arguments as well.