cs50 / compare50

This is compare50, a fast and extensible plagiarism-detection tool.
GNU General Public License v3.0
193 stars 49 forks source link

show ranking by each pass; combined global ranking #35

Open benedictbrown opened 4 years ago

benedictbrown commented 4 years ago

Being able to see the rank the output by any pass and a combined ranking would both be somewhat helpful. The first should be pretty easy, the second is a bit trickier.

cmlsharp commented 4 years ago

The first one is actually a little more annoying than you might realize at first due to the fact that compare50 outputs static HTML pages. The top-50 changes based on which pass you rank by meaning if you ask compare50 to output the top n matches, it would now need to potentially output as many as n * numPasses HTML pages. There are a few kinda weird usability questions that come along with this. Should all n * numPasses pairs show up on the index page regardless of which pass your ranking by? If so, it could be a little weird that you asked for the top 50, but 150 are showing up (and not even necessarily the top 150). Secondly, right now if you want to send someone else the standalone HTML file for the number 1 match, you just send them match_1.html. With multiple passes you wouldn't necessarily know what html file to send (without looking at the URL of course). You could envision parameterizing the file name with the pass name but what if one pair shows up in the top n for two different passes? Do we make two identical HTML files? None of these issues are insurmountable obviously, but they're the reason we hadn't implemented a feature like this yet.

Two is pretty difficult as you mentioned. We could assign arbitrary weights to the passes, but it is unclear how we should assign them.

benedictbrown commented 4 years ago

One compromise might be to select the top 50 based on whichever metric is used for ranking, but then generate index files that order them by each of the different metrics. Or include some javascript in the client-side file that can re-order the listings.

cmlsharp commented 4 years ago

This could work! The only thing I'd worry about is that if you ranked them by a particular pass and then ordered them by a different one, someone might assume that the 50 that show up are the top 50, but obviously that isn't necessarily the case. This could maybe be made clear enough in the UI somehow though.