barzerman / barzer

barzer engine code
MIT License
2 stars 0 forks source link

BENI: coverage too low #524

Closed barzerman closed 11 years ago

barzerman commented 11 years ago

the difference is one extra space, which produces extremely low coverage of .57

http://eu.barzer.net/translate?&key=BHjFDiC0QdoyDF7DBVn1rLWu0LaKRi8QeKiVSSSW&query=land%20life

0xd34df00d commented 11 years ago

...which perfectly makes sense: LAND LIFE has 7 ngrams, LANDLIFE has 6 ngrams, of which only 4 correspond to the original LAND LIFE grams (LAN, AND, LIF, IFE). 4/7 is 0.57.

barzerman commented 11 years ago

it's too low in this particular case .

On Mon, Apr 1, 2013 at 3:53 PM, Georg Rudoy notifications@github.comwrote:

...which perfectly makes sense: LAND LIFE has 7 ngrams, LANDLIFE has 6 ngrams, of which only 4 correspond to the original LAND LIFE grams (LAN, AND, LIF, IFE). 4/7 is 0.57.

— Reply to this email directly or view it on GitHubhttps://github.com/barzerman/barzer/issues/524#issuecomment-15733522 .

www.barzer.net

barzerman commented 11 years ago

right now i only perform one sort of normalization (soft normalization) what we can do actually is this: if no high coverages in BENI, remove all spaces and reapply BENI ...

barzerman commented 11 years ago

what's the status of this?

nchepanov commented 11 years ago

the query still returns 0.57 as a cov and translator drops beni results

0xd34df00d commented 11 years ago

Ive pushed some fixes today, works just fine. Not in master.

0xd34df00d commented 11 years ago

Closing as fixes were pushed and no objections/comments yet.