Roshdy23 / Playmaker

Playmaker is Crawler-based search engine that demonstrates the main features of a search engine (web crawling, indexing and ranking) and the interaction with it along a friendly user interface.
3 stars 1 forks source link

Add calculateIDF, Score of each word, and Rank the pages #7

Closed AbdallahSalah003 closed 3 months ago

AbdallahSalah003 commented 3 months ago

CALCULATING IDF IDF is Inverse Document Frequency, Its computation is based upon global knowledge which is in how many documents, in the Web as a whole, does a word appear. IDF = log(total number of documents / number of documents containing the term)

SCORE OF A WORD Its based on the normalized TF, IDF and tf0 .. tf5 where the tf0 indicates the frequency of the word in headings and tf1 for sub-headings and so on.. The equation of score is as follow: score = (0.001 tf0 + 0.0005 tf1 + 0.00025 tf2 + 0.000125 tf3 + 0.0000625 tf4 + 0.00003125 tf5) IDF + (normalTF IDF)

RANK PAGES Page score is equal to the summation of its word's score and also after perform phrase matching we add extra points for those pages.

FUTURE WORK We can add extra points for pages if the query or a part from the query appears in the page URL