Get rid of the idea of testsuites. It was artificially introduced to accommodate both mainstream and a11y testing.
Category -> Feature -> Tests
Simplify to one category per EPUB or set of EPUBs. A single book shouldn't span categories.
Categories have tags: Advanced or Essential. These aren't categories.
Use major/minor versioning on categories. Decide what constitutes a major vs minor update.
Visually group all scores 2 versions old, so if an RS hasn't had time to test the latest yet, you can at least see their old score for that category. Can also see progress this way.
The front page could have two parallel columns per category, with two scores: one for the latest and one for the previous version.