DonggeLiu / Legion

A coverage-based software testing tool
MIT License
29 stars 4 forks source link

Result evaluation and comparison #16

Open DonggeLiu opened 4 years ago

DonggeLiu commented 4 years ago
  1. Can we access the results of other tools in the competition? It's best to have them in CSV files (e.g. like what we generated in pre-competition experiments) so that we can cherry-pick benchmarks according to Legion's compatibility and only compare those scores.

  2. I failed to reproduce the final score of each tool in the competition from the score of each category with the formula from the Google Sheets of our pre-competition experiments:

    • This is important as we want to compute the score of our new experiments
    • How are the final scores computed from the scores of each category
    • Did they remove the results of some benchmarks? For example, SQLite-MemSafety has only 1 task where everyone got 0; some benchmarks from other sets have the same problem. How did they deal with them?
    • By normalisation, do they mean simply taking averages? (i.e. like we did in our pre-competition experiments)
DonggeLiu commented 4 years ago

Issue 2 is explained by rounding errors.