chenditc / semanticSimilarity

EECS 499 project.
Apache License 2.0
1 stars 1 forks source link

Add stop word filtering for definition of a sense/word. #2

Closed chenditc closed 10 years ago

chenditc commented 10 years ago

frequent word stats based on phrase to word definition gloss:

('something', 839), ('at', 864), ('be', 875), ('', 985), ('s', 993), ('The', 1061), ('used', 1064), ('from', 1179), ('his', 1374), ('was', 1456), ('one', 1538), ('by', 1665), ('as', 1727), ('that', 1901), ('he', 1948), ('on', 1977), ('with', 2257), ('is', 2364), ('an', 2438), ('for', 2506), ('and', 3765), ('to', 6285), ('in', 6928), ('or', 9138), ('of', 12622), ('a', 16203), ('the', 18216)]

The stop word list: the a of or to and an is that was The be

chenditc commented 10 years ago

Test result: Very subtle impact.

For word to sense comparison, using word-description approach: LIN BEFORE: Pearson's correlation 0.313729 Spearman's rho 0.287190 LIN AFTER: Pearson's correlation 0.309410 Spearman's rho 0.278558

JIANG BEFORE: Pearson's correlation 0.274312 Spearman's rho 0.275250 JIANG AFTER: Pearson's correlation 0.275405 Spearman's rho 0.278552

RESNIK BEFORE: Pearson's correlation 0.312775 Spearman's rho 0.278951 RESNIK AFTER: Pearson's correlation 0.311337 Spearman's rho 0.271657

For word to sense comparison, using description-description approach: LIN BEFORE: Pearson's correlation 0.229020 Spearman's rho 0.248013 LIN AFTER: Pearson's correlation 0.227227 Spearman's rho 0.247618

JIANG BEFORE: Pearson's correlation 0.208513 Spearman's rho 0.230301 JIANG AFTER: Pearson's correlation 0.214347 Spearman's rho 0.235749

RESNIK BEFORE: Pearson's correlation 0.231720 Spearman's rho 0.254891 RESNIK AFTER: Pearson's correlation 0.227724 Spearman's rho 0.253649