georg-wolflein / CS5052-Spark

0 stars 0 forks source link

Cluster users by movie taste #6

Closed georg-wolflein closed 3 years ago

georg-wolflein commented 3 years ago

Cluster users by movie taste. See this example for ideas

komodo108 commented 3 years ago

See #5 for more on how this, along with other visualisations, should be displayed

komodo108 commented 3 years ago

Examples of how this could be implemented:

georg-wolflein commented 3 years ago

In the commit above, I implemented an endpoint that supplies data about a graph where the nodes are the users and the edges are weighted by the number of movies a given pair of nodes (i.e. users) both watched. @komodo108 is that the info you need for the force-directed graph visualisation?

The response is as follows:

{
  "nodes": [1, 2, 5, 6],
  "edges": [
    { "from": 1, "to": 6, "weight": 3},
    { "from": 2, "to": 5, "weight": 2},
    ...,
  ]
}
georg-wolflein commented 3 years ago

This endpoint is currently very slow, even on the small dataset. Possible solutions:

Currently, my favoured solution would be the latter because computing the graph takes a few minutes even on the small dataset and the JSON response is already 11MB in size for the small dataset. So if we return the graph only for a subset of users that might be more efficient and useful for visualisation (so it's not too cluttered) in my opinion. What are your thoughts @komodo108?

georg-wolflein commented 3 years ago

@komodo108 I now changed it such that you need to specify the users for which you want to build the graph. This is significantly faster (only around 20 seconds to build the graph and serve the request).

komodo108 commented 3 years ago

Alight! Ill incorporate this into the search then! thanks again