Cleanup: Benchmark Interface

Description

This pr cleans up the benchmark interface. The interface now is a dataclass with attributes train_dataset, test_dataset, and metrics. Additionally, metrics are introduced which cover error and operator property metrics.

Which issue does this PR tackle?

The benchmark interface could not handle multiple metrics.
There was no consistent way of implementing and evaluating metrics.

How does it solve the problem?

Changes Benchmark interface.
Introduces Metric base class.
Introduces error metrics L1_error and MS_error.
Introduces operator metrics NumberOfParameters and SpeedOfEvaluation.

How are the changes tested?

WIP.

Checklist for Contributors

[ ] Scope: This PR tackles exactly one problem.
[ ] Conventions: The branch follows the feature/title-slug convention.
[ ] Conventions: The PR title follows the Bugfix: Title convention.
[ ] Coding style: The code passes all pre-commit hooks.
[ ] Documentation: All changes are well-documented.
[ ] Tests: New features are tested and all tests pass successfully.
[ ] Changelog: Updated CHANGELOG.md for new features or breaking changes.
[ ] Review: A suitable reviewer has been assigned.

Checklist for Reviewers:

[ ] The PR solves the issue it claims to solve and only this one.
[ ] Changes are tested sufficiently and all tests pass.
[ ] Documentation is complete and well-written.
[ ] Changelog has been updated, if necessary.

aai-institute / continuiti

WIP: Benchmark Interface #93