cfedermann / Appraise

Appraise evaluation system for manual evaluation of machine translation output
http://www.appraise.cf/
BSD 3-Clause "New" or "Revised" License
73 stars 37 forks source link

Filtering equal hypothesis #31

Closed npecheux closed 9 years ago

npecheux commented 11 years ago

Would it be possible to filter out, or to group equal hypothesis. There is no interest in ranking equal outputs, and in fact I loose some time when having 5 hypothesis to find the differences between two of them (many times there are equals, sometimes, they only differ by one character). It would be easier if we knew all outputs to be different.

cfedermann commented 11 years ago

Hi there,

thanks for the feedback; Appraise originally did not allow import of identical translation outputs -- these are a by-product of the WMT13 data import.

I'm aware that this is a huge annoyance and will fix it soon.

Cheers, Chris

On Wed, Jun 5, 2013 at 12:28 PM, npecheux notifications@github.com wrote:

Would it be possible to filter out, or to group equal hypothesis. There is no interest in ranking equal outputs, and in fact I loose some time when having 5 hypothesis to find the differences between two of them (many times there are equals, sometimes, they only differ by one character). It would be easier if we knew all outputs to be different.

— Reply to this email directly or view it on GitHubhttps://github.com/cfedermann/Appraise/issues/31 .

npecheux commented 11 years ago

Hi Chris. It's not really annoying, but it might be an improvement, e.g. for next year. It is however not so easy to deal with, except when all 5 hypothesis are equal, or buy just indicating with some kind of symbol on the left when two hypothesis are entirely the same. To save some annotation time, it could be interesting to automaticly rank the same all equal hypothesis, with care not to mess up the evaluation statistics. Even more usefull would be to shed light on part of sentences that are equal (e.g. many time the first part until the comma is the same in 4/5 sentences) but this seems a bit challenging, with possible drawback and maybe biasind the evaluation.

Thanks for having taken this issue in consideration so quickly !

Nico

PS: A really annoying feature, but Appraise cannot deal with it is that some translation are the same but for some impossible to see detail (might be a larger space or something like this ?). I'm not sure about. But to see if hypothesis are the same, I look if they are aligned (e.g. the line break happens at the same word). Sometimes they are not, but when I check character by character I don't see any difference. Could this be a Appraise presentaton bug ?

cfedermann commented 11 years ago

The latter issue you describe should be related to "non-printable" Unicode characters, i.e. wider spaces, etc.

cfedermann commented 9 years ago

This is now tracked in #45. Closing this issue.