daniter-cu / dex-chiter

Automatically exported from code.google.com/p/dex-chiter
0 stars 0 forks source link

incorrect order of results #14

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Enter a search string
where can i get good pho
Content [I heard the Tofu house is a good restaurant], tfidf [3.135494]
Content [Checkout the Pho restaurant on Broadway], tfidf [3.135494]
Enter a search string

Analysis:
both "good" and "pho" apear only once in the corpus and therefore have the same 
weight and by default we just show this order.  Adding more listings with the 
word "good" is a work around.  A possible fix for this is using some kind of 
word probability or n-gram service to weigh down words that are common.  

NOTE:
We should discuss if we want our idf to be based on the 
1. users corpus 
2. the collections of all user corpuses
3. general use of words in the english (not sure where to get this data from 
yet)

Original issue reported on code.google.com by Iter....@gmail.com on 23 Jun 2012 at 11:57

GoogleCodeExporter commented 9 years ago
More weight on nouns than adjectives

Original comment by Chang.Ke...@gmail.com on 24 Jun 2012 at 1:04