chrplr / openlexicon

Access to lexical databases
Creative Commons Attribution Share Alike 4.0 International
114 stars 17 forks source link

imcompatible `freqlemlivres` and `freqlivres` #18

Closed alephpi closed 1 year ago

alephpi commented 1 year ago

Hi, thanks for your awesome work! However, when I use Lexique383.tsv, I observe the following: image From the manual I understand the freqlemlivres should be the frequency of lemma of the word and freqlivres should be the frequency of the word, right? But as we see in the table, the lemma of danse(35155), danser(35158) and danseur(35172) are themselves, while these two fields are not equal. Why?

chrplr commented 1 year ago

Hello,

"danse - Noun", "danse - Verb" and "danseur - Noun" are different lemmas according to the parser we used.

92.57 is the sum of frequencies of all the derivations of the verb "danser" 25.68 is the sum of frequencies of all the derivations of the noun "danseur" (danseur singular + plural + feminine sing. + feminine plur.) 35.27 is the sum of freq of the derivatives of "danse" (danse singular + danse plural)

(and as far as I can see, freqlemlivres is indeed the sum of the relevant freqlivres)

I am not sure about what you did expect (?)

-- Christophe Pallier (http://www.pallier.org) INSERM Cognitive Neuroimaging Lab (http://www.unicog.org)

On Sun, May 21, 2023 at 12:44 PM 润心 @.***> wrote:

Hi, thanks for your awesome work! However, when I use Lexique383.tsv, I observe the following: [image: image] https://user-images.githubusercontent.com/61275421/239735119-a34060ee-ce53-4b9c-b774-982ee2046715.png From the manual I understand the freqlemlivres should be the frequency of lemma of the word and freqlivres should be the frequency of the word, right? But as we see in the table, the lemma of danse(35155), danser(35158) and danseur(35172) are themselves, while these two fields are not the equal. Why?

— Reply to this email directly, view it on GitHub https://github.com/chrplr/openlexicon/issues/18, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALVWMWUA3NDZGIK5TC3VS3XHHWXJANCNFSM6AAAAAAYJKFADE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

alephpi commented 1 year ago

Yeah, you're right. I thought the freqlemlivres and freqlivres of danse should be equal, but it turns out that freqlemlivre is actually the sum of all the words whose lemma is danse.