Cluster users by movie taste

georg-wolflein / CS5052-Spark

0 stars 0 forks source link

Cluster users by movie taste #6

Closed georg-wolflein closed 3 years ago

georg-wolflein commented 3 years ago

Cluster users by movie taste. See this example for ideas

komodo108 commented 3 years ago

See #5 for more on how this, along with other visualisations, should be displayed

komodo108 commented 3 years ago

Examples of how this could be implemented:

Force-Directed Tree
Disjoint Force-Directed Tree
Twitch Example
Force-Directed Graph (Currently looks the best)

georg-wolflein commented 3 years ago

In the commit above, I implemented an endpoint that supplies data about a graph where the nodes are the users and the edges are weighted by the number of movies a given pair of nodes (i.e. users) both watched. @komodo108 is that the info you need for the force-directed graph visualisation?

The response is as follows:

{
  "nodes": [1, 2, 5, 6],
  "edges": [
    { "from": 1, "to": 6, "weight": 3},
    { "from": 2, "to": 5, "weight": 2},
    ...,
  ]
}

georg-wolflein commented 3 years ago

This endpoint is currently very slow, even on the small dataset. Possible solutions:

precompute this graph on startup
let the user of the app supply a list of user IDs, and only use those users to create the graph

Currently, my favoured solution would be the latter because computing the graph takes a few minutes even on the small dataset and the JSON response is already 11MB in size for the small dataset. So if we return the graph only for a subset of users that might be more efficient and useful for visualisation (so it's not too cluttered) in my opinion. What are your thoughts @komodo108?

georg-wolflein commented 3 years ago

@komodo108 I now changed it such that you need to specify the users for which you want to build the graph. This is significantly faster (only around 20 seconds to build the graph and serve the request).

komodo108 commented 3 years ago

Alight! Ill incorporate this into the search then! thanks again