Adding a new feature! In order to create better suggestions, the use of probabilistic computations for high much a user may like a suggested Wikipedia pages is required. This will require creating a cost function that takes in a given document and will return how much a user may like it on a scale of -1 to 1. Before this cost function exists, we need to have a training set (the documents the user has already seen). Currently, the scales that we can use are the following:
Vote: On a scale of 1-5, how much someone enjoyed the page they visited
Focus time: How long they remained on that page.
Vote time: How long into their time on the page that they voted on it.
Using these scales we can calculate the real value of the training set documents. Another consideration for what we can rank on would be related to the content of the pages, but that is not part of these initial feature specifications.
Once we have the real values on a scale of -1 (the user really disliked it) to 1 (the user loved it), then we can compute a weight vector. So if we have three documents, the following example could be a possibility:
Document Vectors
Weight Vector
=
Estimated Value
vote1 focus-time1 vote-time1
w1
-1
vote2 focus-time2 vote-time2
w2
0.1
vote3 focus-time3 vote-time3
w1
1
After we compute the weight value for w1, w2, w3 then we have a cost function we can use to check for the user estimated preference on a new page.
While this theoretically works, more enhancements are required for this to be fully functional. More comments forthcoming with some errors that have no been mentioned.
Adding a new feature! In order to create better suggestions, the use of probabilistic computations for high much a user may like a suggested Wikipedia pages is required. This will require creating a cost function that takes in a given document and will return how much a user may like it on a scale of
-1
to1
. Before this cost function exists, we need to have a training set (the documents the user has already seen). Currently, the scales that we can use are the following:Using these scales we can calculate the real value of the training set documents. Another consideration for what we can rank on would be related to the content of the pages, but that is not part of these initial feature specifications.
Once we have the real values on a scale of
-1
(the user really disliked it) to1
(the user loved it), then we can compute a weight vector. So if we have three documents, the following example could be a possibility:After we compute the weight value for
w1
,w2
,w3
then we have a cost function we can use to check for the user estimated preference on a new page.