HPCE / hpce-2017-cw5

1 stars 6 forks source link

[Feature] Count Us In Results to only include correct ones #31

Closed aDu closed 6 years ago

aDu commented 6 years ago

In the code we are given, there is a script (bin/run_puzzle) that tests if the output of the reference solution is the same as our solution [run_puzzle], 1511188501.78, 2, Output is correct, which gives us some confidence that our "faster" implementation is correct.

In the README, it states that in the "Count Us In" tests: "These tests do not check for correctness"

I was wondering if we can make it so that the incorrect ones are not counted towards the results in the graphs (i.e. incorrect implementations will not contribute to the min/max/median). This is so that we can have an idea of what the median time is out of the correct solutions. If that's already the case, please ignore this issue.

Thank you.

m8pple commented 6 years ago

As it stands the median includes everyone, whether or not they are incorrect.

The real purpose of the intermediate runs is for assessment purposes, as it drastically increases the likelihood that everyone's submission will compile and run first time (before I did this, about 50% of submissions had to be manually fixed in some way; now they all run first time, except for people who make ill-judged last-minute commits).

So the performance results are partially just extra incentive for students to keep master current, in order to keep all submissions buildable in AWS. How meaningful they actually are is debatable - if I was doing this I probably wouldn't enable count-us-in, as it is just a distraction, but past cohorts have preferred having it available.

My position on correctness is that it should be the student's job to manage correctness, so if there was some kind of censoring of incorrect results then people would be able to see whether their own results were correct or not. I suppose I could add logic that somehow made it appear as if their own results were still included, but that gets too complicated.

In general I've found that the versions of the puzzles that build and run are usually correct - incorrect ones usually crash. Typically the fastest version is correct, unless it is too fast to be believable.