Add 'Both Bad' button to judging

obo commented 9 years ago

Next to the Skip, we should also have 'Both Bad' button. This absolute scoring is very important (and different from plain pick better) because MT will be almost always bad but humans can make killer errors. If the judge understands he has to pick a better one, he would pick the better (but bad) human translation. And we would not know anything about the fact that it is suspicious.

cifkao commented 9 years ago

The question is how we should alter the score when someone clicks the button.

On Tue, Sep 2, 2014 at 3:48 PM, Ondrej Bojar notifications@github.com wrote:

Next to the Skip, we should also have 'Both Bad' button. This absolute scoring is very important (and different from plain pick better) because MT will be almost always bad but humans can make killer errors. If the judge understands he has to pick a better one, he would pick the better (but bad) human translation. And we would not know anything about the fact that it is suspicious.

— Reply to this email directly or view it on GitHub https://github.com/cifkao/tct/issues/13.

obo commented 9 years ago

We should reduce the score for both and log separately, e.g. in one more column in the database, that each of the candidates got one more 'bad mark'. In the end, we should know how many such disqualifying marks did each sentence get.

----- Original Message -----

From: "Ondřej Cífka" notifications@github.com To: "cifkao/tct" tct@noreply.github.com Cc: "Ondrej Bojar" bojar@ufal.mff.cuni.cz Sent: Tuesday, 2 September, 2014 3:55:08 PM Subject: Re: [tct] Add 'Both Bad' button to judging (#13)

The question is how we should alter the score when someone clicks the button.

On Tue, Sep 2, 2014 at 3:48 PM, Ondrej Bojar notifications@github.com wrote:

Next to the Skip, we should also have 'Both Bad' button. This absolute scoring is very important (and different from plain pick better) because MT will be almost always bad but humans can make killer errors. If the judge understands he has to pick a better one, he would pick the better (but bad) human translation. And we would not know anything about the fact that it is suspicious.

— Reply to this email directly or view it on GitHub https://github.com/cifkao/tct/issues/13.

Reply to this email directly or view it on GitHub: https://github.com/cifkao/tct/issues/13#issuecomment-54154427

Ondrej Bojar (mailto:obo@cuni.cz / bojar@ufal.mff.cuni.cz) http://www.cuni.cz/~obo

cifkao commented 9 years ago

We have a table for individual scorings with a result column that is either 'a' or 'b'. We could just make it 'x' when both translations are unacceptable.

Or do you mean to create a bad mark counter in the translations table? Do we also want to increment it when the other translation gets selected?

edasubert commented 9 years ago

This is a bit unfortunate since as already mentioned the Elo scoring is strictly one win one lose a simple solution would be to (if 'both bad' selected) we let both translations lose to virtual one with base 1400 score that way we can keep current tables and everything just add a 'special' entry for this virtual translation and make sure it's score stays the same

On Thu, Sep 4, 2014 at 9:58 AM, Ondřej Cífka notifications@github.com wrote:

We have a table for individual scorings with a result column that is either 'a' or 'b'. We could just make it 'x' when both translations are unacceptable.

Or do you mean to create a bad mark counter in the translations table? Do we also want to increment it when the other translation gets selected?

— Reply to this email directly or view it on GitHub https://github.com/cifkao/tct/issues/13#issuecomment-54425813.

obo commented 9 years ago

Yes, adding 'x' for 'Both Bad' would be great. (And actually, I would consider one more button, 'Both Acceptable', e.g. 'o' for OK in the database.)

As for the counters, I'd actually want to know exactly: how many wins (a), losses (b), bads (x) and possibly OKs (o) did each candidate get.

If we don't add Both OK, I hope the scorers will tell us both relative knowledge (this one is better) as well as absolute knowledge (this candidate is bad, full stop).

----- Original Message -----

From: "Ondřej Cífka" notifications@github.com To: "cifkao/tct" tct@noreply.github.com Cc: "Ondrej Bojar" bojar@ufal.mff.cuni.cz Sent: Thursday, 4 September, 2014 9:58:05 AM Subject: Re: [tct] Add 'Both Bad' button to judging (#13)

We have a table for individual scorings with a result column that is either 'a' or 'b'. We could just make it 'x' when both translations are unacceptable.

Or do you mean to create a bad mark counter in the translations table? Do we also want to increment it when the other translation gets selected?

Reply to this email directly or view it on GitHub: https://github.com/cifkao/tct/issues/13#issuecomment-54425813

Ondrej Bojar (mailto:obo@cuni.cz / bojar@ufal.mff.cuni.cz) http://www.cuni.cz/~obo

obo commented 9 years ago

I dont know anything about Elo. The main reason for adding Both Bad is that when scoring, I really did not want to pick any of two bad outputs.

The second reason is that (as written in another comment here) I believe this would give us also some absolute information.

I suggest adding Both Bad and -- if there is no better way -- completely ignore it in Elo scoring and just use it elsewhere. E.g. I'd make use of it in the manual shutter page.

----- Original Message -----

From: "edasubert" notifications@github.com To: "cifkao/tct" tct@noreply.github.com Cc: "Ondrej Bojar" bojar@ufal.mff.cuni.cz Sent: Thursday, 4 September, 2014 10:11:40 AM Subject: Re: [tct] Add 'Both Bad' button to judging (#13)

This is a bit unfortunate since as already mentioned the Elo scoring is strictly one win one lose a simple solution would be to (if 'both bad' selected) we let both translations lose to virtual one with base 1400 score that way we can keep current tables and everything just add a 'special' entry for this virtual translation and make sure it's score stays the same

On Thu, Sep 4, 2014 at 9:58 AM, Ondřej Cífka notifications@github.com wrote:

We have a table for individual scorings with a result column that is either 'a' or 'b'. We could just make it 'x' when both translations are unacceptable.

Or do you mean to create a bad mark counter in the translations table? Do we also want to increment it when the other translation gets selected?

— Reply to this email directly or view it on GitHub https://github.com/cifkao/tct/issues/13#issuecomment-54425813.

Reply to this email directly or view it on GitHub: https://github.com/cifkao/tct/issues/13#issuecomment-54427946

Ondrej Bojar (mailto:obo@cuni.cz / bojar@ufal.mff.cuni.cz) http://www.cuni.cz/~obo

edasubert commented 9 years ago

I am not arguing against both bad button. The question is how to solve it internally. We can not abandon the Elo since the entire scoring system is wrapped around it. We need a way to adjust a score of both bad translations so it does not render then not future Elo-score-able. One possible way is to score them against the virtual translation. Then once you want to find all both bad translations you search for records of comparison to this specific one.

I will hoverer argue against both OK button. We are not looking for the most OK translation. We are looking for the best translation. If you do not feel like you can judge which one is better use skip button. If we would raise the score for OK translation it could harm translations that are better but were not selected for judging at the moment. On Sep 4, 2014 10:26 AM, "Ondrej Bojar" notifications@github.com wrote:

I dont know anything about Elo. The main reason for adding Both Bad is that when scoring, I really did not want to pick any of two bad outputs.

The second reason is that (as written in another comment here) I believe this would give us also some absolute information.

I suggest adding Both Bad and -- if there is no better way -- completely ignore it in Elo scoring and just use it elsewhere. E.g. I'd make use of it in the manual shutter page.

----- Original Message -----

From: "edasubert" notifications@github.com To: "cifkao/tct" tct@noreply.github.com Cc: "Ondrej Bojar" bojar@ufal.mff.cuni.cz Sent: Thursday, 4 September, 2014 10:11:40 AM Subject: Re: [tct] Add 'Both Bad' button to judging (#13)

This is a bit unfortunate since as already mentioned the Elo scoring is strictly one win one lose a simple solution would be to (if 'both bad' selected) we let both translations lose to virtual one with base 1400 score that way we can keep current tables and everything just add a 'special' entry for this virtual translation and make sure it's score stays the same

On Thu, Sep 4, 2014 at 9:58 AM, Ondřej Cífka notifications@github.com wrote:

We have a table for individual scorings with a result column that is either 'a' or 'b'. We could just make it 'x' when both translations are unacceptable.

Or do you mean to create a bad mark counter in the translations table? Do we also want to increment it when the other translation gets selected?

— Reply to this email directly or view it on GitHub https://github.com/cifkao/tct/issues/13#issuecomment-54425813.

Reply to this email directly or view it on GitHub: https://github.com/cifkao/tct/issues/13#issuecomment-54427946

Ondrej Bojar (mailto:obo@cuni.cz / bojar@ufal.mff.cuni.cz) http://www.cuni.cz/~obo

— Reply to this email directly or view it on GitHub https://github.com/cifkao/tct/issues/13#issuecomment-54430293.

obo commented 9 years ago

Agreed. Let's add just the Both Bad. At last I understand what you meant by the virtual translation, and your handling seems reasonable: for Both Bad, each of the segments independently should get a 'worse than' mark in a virtual comparison against something terribly bad.

On the other hand, I find it essential that we still somewhere record the actual comparisons and answers we got, in the most detailed and original form, so that we can reinterpret them if we find something better.

On September 4, 2014 10:41:49 AM CEST, edasubert notifications@github.com wrote:

I am not arguing against both bad button. The question is how to solve it internally. We can not abandon the Elo since the entire scoring system is wrapped around it. We need a way to adjust a score of both bad translations so it does not render then not future Elo-score-able. One possible way is to score them against the virtual translation. Then once you want to find all both bad translations you search for records of comparison to this specific one.

I will hoverer argue against both OK button. We are not looking for the most OK translation. We are looking for the best translation. If you do not feel like you can judge which one is better use skip button. If we would raise the score for OK translation it could harm translations that are better but were not selected for judging at the moment. On Sep 4, 2014 10:26 AM, "Ondrej Bojar" notifications@github.com wrote:

I dont know anything about Elo. The main reason for adding Both Bad is that when scoring, I really did not want to pick any of two bad outputs.

The second reason is that (as written in another comment here) I believe this would give us also some absolute information.

I suggest adding Both Bad and -- if there is no better way -- completely ignore it in Elo scoring and just use it elsewhere. E.g. I'd make use of it in the manual shutter page.

----- Original Message -----

From: "edasubert" notifications@github.com To: "cifkao/tct" tct@noreply.github.com Cc: "Ondrej Bojar" bojar@ufal.mff.cuni.cz Sent: Thursday, 4 September, 2014 10:11:40 AM Subject: Re: [tct] Add 'Both Bad' button to judging (#13)

This is a bit unfortunate since as already mentioned the Elo scoring is strictly one win one lose a simple solution would be to (if 'both bad' selected) we let both translations lose to virtual one with base 1400 score that way we can keep current tables and everything just add a 'special' entry for this virtual translation and make sure it's score stays the same

On Thu, Sep 4, 2014 at 9:58 AM, Ondřej Cífka notifications@github.com wrote:

We have a table for individual scorings with a result column that is either 'a' or 'b'. We could just make it 'x' when both translations are unacceptable.

Or do you mean to create a bad mark counter in the translations table? Do we also want to increment it when the other translation gets selected?

— Reply to this email directly or view it on GitHub https://github.com/cifkao/tct/issues/13#issuecomment-54425813.

Reply to this email directly or view it on GitHub: https://github.com/cifkao/tct/issues/13#issuecomment-54427946

Ondrej Bojar (mailto:obo@cuni.cz / bojar@ufal.mff.cuni.cz) http://www.cuni.cz/~obo

— Reply to this email directly or view it on GitHub https://github.com/cifkao/tct/issues/13#issuecomment-54430293.

Reply to this email directly or view it on GitHub: https://github.com/cifkao/tct/issues/13#issuecomment-54432668

Ondrej Bojar (mailto:obo@cuni.cz / bojar@ufal.mff.cuni.cz) http://www.cuni.cz/~obo

cifkao commented 9 years ago

I'm not very much in favor of having a 'virtual translation' entry in the database because I would have to change the way scoring works. :-) An entry in the scorings table, identified by a hash, is created every time a user accesses the Scorings page, and updated when the user makes a choice. This is to prevent users from resubmitting their choice. If we used the virtual translation, we would actually have to remove the entry from the database and create two new entries (with different hashes etc.)... Not a very clean solution.

So I think the best solution is not to actually make the translations lose against a virtual translation, just change their score as if they did.

We could also make the two translations lose against each other. That is, just calculate the 'losing' part of the Elo rating and subtract that from either translation's score. (I'm not sure how this would behave.)

edasubert commented 9 years ago

OK let's do it your way but I am in favor of pretending they lost to default value On Sep 4, 2014 4:23 PM, "Ondřej Cífka" notifications@github.com wrote:

I'm not very much in favor of having a 'virtual translation' entry in the database because I would have to change the way scoring works. :-) An entry in the scorings table, identified by a hash, is created every time a user accesses the Scorings page, and updated when the user makes a choice. This is to prevent users from resubmitting their choice. If we used the virtual translation, we would actually have to remove the entry from the database and create two new entries (with different hashes etc.)... Not a very clean solution.

So I think the best solution is not to actually make the translations lose against a virtual translation, just change their score as if they did.

We could also make the two translations lose against each other. That is, just calculate the 'losing' part of the Elo rating and subtract that from either translation's score. (I'm not sure how this would behave.)

— Reply to this email directly or view it on GitHub https://github.com/cifkao/tct/issues/13#issuecomment-54484277.

cifkao commented 9 years ago

Added a 'Both wrong' button (sounds better I think). When clicked, we make both candidates lose to a notional translation with score 1400 (this is stored in the settings as Scoring.both_bad_winner_score).

edasubert commented 9 years ago

I do not think 'both wrong' is better I would suggest something along 'neither acceptable' since I think this button be used as less as possible It is just for that (hopefully) rare occasion when neither translation has meaning I should pick slightly better one out of two bad ones at least that is how I see it On Sep 5, 2014 10:41 AM, "Ondřej Cífka" notifications@github.com wrote:

Added a 'Both wrong' button (sounds better I think). When clicked, we make both candidates lose to a notional translation with score 1400 (this is stored in the settings as Scoring.both_bad_winner_score).

— Reply to this email directly or view it on GitHub https://github.com/cifkao/tct/issues/13#issuecomment-54599152.

cifkao commented 9 years ago

How about 'both junk'?

edasubert commented 9 years ago

It does have the proper meaning, but I would wish for something a bit more classy On Sep 5, 2014 10:52 AM, "Ondřej Cífka" notifications@github.com wrote:

How about 'both junk'?

— Reply to this email directly or view it on GitHub https://github.com/cifkao/tct/issues/13#issuecomment-54600079.

obo commented 9 years ago

Both Junk sounds best.

On September 5, 2014 10:52:01 AM CEST, "Ondřej Cífka" notifications@github.com wrote:

How about 'both junk'?

Reply to this email directly or view it on GitHub: https://github.com/cifkao/tct/issues/13#issuecomment-54600079

Ondrej Bojar (mailto:obo@cuni.cz / bojar@ufal.mff.cuni.cz) http://www.cuni.cz/~obo

obo commented 9 years ago

Both Inacceptable (or Unacceptable??) ...but this is twitter users, they surely understant Junk better than Inacceptable.

On September 5, 2014 10:54:18 AM CEST, edasubert notifications@github.com wrote:

It does have the proper meaning, but I would wish for something a bit more classy On Sep 5, 2014 10:52 AM, "Ondřej Cífka" notifications@github.com wrote:

How about 'both junk'?

— Reply to this email directly or view it on GitHub https://github.com/cifkao/tct/issues/13#issuecomment-54600079.

Reply to this email directly or view it on GitHub: https://github.com/cifkao/tct/issues/13#issuecomment-54600284

Ondrej Bojar (mailto:obo@cuni.cz / bojar@ufal.mff.cuni.cz) http://www.cuni.cz/~obo

edasubert commented 9 years ago

I would rather use ridiculous or something Junk has some other meanings... On Sep 5, 2014 11:14 AM, "Ondrej Bojar" notifications@github.com wrote:

Both Inacceptable (or Unacceptable??) ...but this is twitter users, they surely understant Junk better than Inacceptable.

On September 5, 2014 10:54:18 AM CEST, edasubert notifications@github.com wrote:

It does have the proper meaning, but I would wish for something a bit more classy On Sep 5, 2014 10:52 AM, "Ondřej Cífka" notifications@github.com wrote:

How about 'both junk'?

— Reply to this email directly or view it on GitHub https://github.com/cifkao/tct/issues/13#issuecomment-54600079.

Reply to this email directly or view it on GitHub: https://github.com/cifkao/tct/issues/13#issuecomment-54600284

Ondrej Bojar (mailto:obo@cuni.cz / bojar@ufal.mff.cuni.cz) http://www.cuni.cz/~obo

— Reply to this email directly or view it on GitHub https://github.com/cifkao/tct/issues/13#issuecomment-54602129.

cifkao / tct

Add 'Both Bad' button to judging #13