SPIKE: research ways to persist benchmarking results between different runs

We want to be able to track how our models and solutions are performing and whats the are the strengths and weaknesses Research ways to persist benchmarking results between different runs, included with this, research ways to display and visualize the data. The idea is that we will be benchmarking models/approaches for every single GitHub test suite run.

Acceptance Criteria

propose solutions to persist benchmanrking results with links to the github run
propose solutions to chart and visualize the results overtime

Additional context feel free to reach out to @zdeveloper

CDCgov / IDWA

SPIKE: research ways to persist benchmarking results between different runs #97