Modified endpoint to separate getting models/benchmarks from scoring

Description

This PR is motivated by the need to run individual scoring jobs for each model/benchmark pair on an HPC (e.g. Openmind) instead of running a single job to compute scores for all new models and benchmarks in a submission. We handle this by changing the structure of the scoring endpoint to separate getting the model/benchmark names functionality from the actual scoring (for example, to evaluate ALL_PUBLIC without necessarily scoring). As a result, the domain-specific plugin manager now has the flexibility to decide the best method for identifying and scoring the model/benchmark pairs.

Testing Strategy

This PR implements unit tests that separately test the functionality of both model/benchmark retrieval and scoring using dummy models and benchmarks.

brain-score / core

Modified endpoint to separate getting models/benchmarks from scoring #40

Description

Testing Strategy