charlie-map / wiki-suggestor-service

A C backend that makes suggestions for the Wikiread extension
0 stars 0 forks source link

Update to K-D Tree Search #6

Open charlie-map opened 2 years ago

charlie-map commented 2 years ago

There are two reasons for this issue; the first is the fact that for the unique recommendation system (#5) there needs to be a way to return multiple documents, and the other component is possibly returning better results from the k-d tree. The latter component is a little bit up in the air, but may be worth some testing on comparing more about the documents while going through the k-d tree. One possible route for extra accuracy would be some edit distance upon the titles of pages. This issue will be updated as changes are made to kdtree_search().

charlie-map commented 2 years ago

As a test, I'm going to try using cosine similarity to compute distances at each level of the kd-tree. This will then correspond more directly with what occurs when computing clusters, so perhaps that will ensure that the same algorithmic ends are happening after narrowing to a cluster for nearest neighbors search.

charlie-map commented 2 years ago

In the recent commit 7c41983, some small adjustments have been made to try using a linked list as the main component for sending responses from the searching. This is still not perfect when considering how to return n documents from the search. An alternative besides a linked list may be necessary.

charlie-map commented 2 years ago

See #8 for fix. n documents can now be returned relatively easily. This now uses a linked list to gather documents as we search through the tree and drops documents when better documents are found. Before this issue can be closed, another small feature of caching some of the results as movement through the tree occurs because currently a current document and the search payload may have to have distances computed multiple times in a row, so caching would prove fruitful in speeding up the program.