New interface for tasks.

I removed my negative sampling task. As per #3 MultipleNegativeRanking loss is going to be the default. Tasks can be easily added to the registry in the future if we need it.

Multiple tasks are now using sentence_transformers' default training regime. Tasks, that have the same loss are merged, so that training examples are sampled from a mixture of data sets. This is achieved by grouping tasks by their string representation, which should be the same if the loss function is the same. For example a snippet from the implementation of MultipleNegativesRanking:

class MultipleNegativesRanking(Task):
    ***
    def __str__(self):
        return f"MultipleNegativesRanking(scale={self.scale})"

Tasks are then grouped as such:

from itertools import groupby
from dfm_sentence_trf.tasks import Task

tasks: list[Task]
for loss_name, group in groupby(tasks, str):
    pass

We can't just use @dataclass because tasks also take the dataset and dataset related arguments.

Datasets for tasks can be loaded with :hugs: Datasets' load_dataset() function when describing tasks in the configuration file. As such both local and remote datasets can be loaded. #4

Example:

[tasks.hestenettet]
@tasks="multiple_negatives_ranking"
sentence1="question"
sentence2="answer"

[tasks.hestenettet.dataset]
@loaders="load_dataset"
path="some/local/file.jsonl"

Unfortunately due to validation errors, I couldn't just put the original function into the registry, here is the snippet that does it:

registry.loaders = catalogue.create(
    "confection", "loaders", entry_points=False
)

@registry.loaders.register("load_dataset")
def _load_dataset(
    path: str, name: Optional[str] = None
) -> Union[Dataset, DatasetDict]:
    return load_dataset(path, name=name)  # type: ignore

This might have to change in the future if we intend to use other arguments as well.

centre-for-humanities-computing / dfm-sentence-transformers

New interface for tasks. #5