French Lemma Problem - Githubissues

GoogleCodeExporter commented 9 years ago

After simply executing Morfette, as in the following example, the system is 
often unable to recognize the correct lemma when the verb tense is future or 
conditional.

Input: Monsieur le Président , je vous prierai avant tout de me pardonner si 
mon intervention n' est pas aussi dramatique que celle de M. Elles

Output: 

je il CL_suj-1ms
vous le/lui CL_obj-2mp
prierai prieravoir V-indicatifpresent1s 
avant avant P
tout tout PRO_ind-3ms
de de P
...

In the 3rd line of the output (prierai prieravoir V-indicatifpresent1s), 
instead of "prier" Morfette produces "prieravoir", which doesn't exist in 
French. The same error comes almost always that there is a future or 
conditional tense involved. Some of the non-words/lemmas include "feravoir", 
"solèveravoir", "avoueravoir", "avoiri", among others. 

I'm using morfette-0.3.4-i10x3.model on linux.

I don't exactly understand what is the source of this problem, but can it be 
fixed?

Original issue reported on code.google.com by sharid.l...@gmail.com on 1 Oct 2013 at 2:08

GoogleCodeExporter commented 9 years ago

Thanks for the report. I have been able to reproduce this issue and will look 
into it shortly.

Original comment by pitekus on 1 Oct 2013 at 9:51

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

Original comment by pitekus on 1 Oct 2013 at 9:51

Added labels: Priority-High
Removed labels: Priority-Medium

GoogleCodeExporter commented 9 years ago

This is not a bug in morfette but rather a limitation of the French data that 
the model was trained on. Since morfette is used quite a bit for French it 
would be nice to solve this is some way. 
It is possible that re-designing the lemmatization feature set could help to 
boost a bit the influence of the lexicon features on the predicted label.

Original comment by pitekus on 2 Oct 2013 at 7:44

Added labels: Type-Enhancement, Priority-Medium
Removed labels: Type-Defect, Priority-High

GoogleCodeExporter commented 9 years ago

Original comment by pitekus on 2 Oct 2013 at 7:44

gchrupala / morfette

French Lemma Problem #25