gchrupala / morfette

Supervised learning of morphology
BSD 2-Clause "Simplified" License
28 stars 5 forks source link

French Lemma Problem #25

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
After simply executing Morfette, as in the following example, the system is 
often unable to recognize the correct lemma when the verb tense is future or 
conditional.

Input: Monsieur le Président , je vous prierai avant tout de me pardonner si 
mon intervention n' est pas aussi dramatique que celle de M. Elles

Output: 

je il CL_suj-1ms
vous le/lui CL_obj-2mp
prierai prieravoir V-indicatifpresent1s 
avant avant P
tout tout PRO_ind-3ms
de de P
...

In the 3rd line of the output (prierai prieravoir V-indicatifpresent1s), 
instead of "prier" Morfette produces "prieravoir", which doesn't exist in 
French. The same error comes almost always that there is a future or 
conditional tense involved. Some of the non-words/lemmas include "feravoir", 
"solèveravoir", "avoueravoir", "avoiri", among others. 

I'm using morfette-0.3.4-i10x3.model on linux.

I don't exactly understand what is the source of this problem, but can it be 
fixed? 

Original issue reported on code.google.com by sharid.l...@gmail.com on 1 Oct 2013 at 2:08

GoogleCodeExporter commented 9 years ago
Thanks for the report. I have been able to reproduce this issue and will look 
into it shortly.

Original comment by pitekus on 1 Oct 2013 at 9:51

GoogleCodeExporter commented 9 years ago

Original comment by pitekus on 1 Oct 2013 at 9:51

GoogleCodeExporter commented 9 years ago
This is not a bug in morfette but rather a limitation of the French data that 
the model was trained on. Since morfette is used quite a bit for French it 
would be nice to solve this is some way. 
It is possible that re-designing the lemmatization feature set could help to 
boost a bit the influence of the lexicon features on the predicted label. 

Original comment by pitekus on 2 Oct 2013 at 7:44

GoogleCodeExporter commented 9 years ago

Original comment by pitekus on 2 Oct 2013 at 7:44