CartoDB / camshaft

Analysis library to create data views from queries
BSD 3-Clause "New" or "Revised" License
17 stars 7 forks source link

Perfect Sales Territory - K means with Weights Analysis #38

Closed stuartlynn closed 8 years ago

stuartlynn commented 8 years ago

Create a Camshaft/Crankshaft analysis that takes a list of points and weights and calculates n weighted cluster centers for those points.

Use sk-learn with modified distance formula so that we can find the weighted centroid rather than the regular one, something like:

$d = ((x-x)^2 + (y-y)^2 )/w$

rochoa commented 8 years ago

What would you expect from this analysis as output? A new polygon geometry or a column with categories?

cc @javisantana

javisantana commented 8 years ago

confirmed with @andrewxhill, a new column with the cluster number (or something like that)

andrewxhill commented 8 years ago

Inputs:

Output

javisantana commented 8 years ago

what is the ETA?

stuartlynn commented 8 years ago

Should have a crankshaft and camshaft PR ready today

javisantana commented 8 years ago

ok, nice

andrewxhill commented 8 years ago

Sorry.. talking with stuart my answer is actually wrong for the kmeans bit

Inputs:

Many points in a table N, input of desired clusters

Output

Many points table Column of 1-N cluster IDs

So a table with 100 points will return a table with 100 points plus a cluster id column

stuartlynn commented 8 years ago

We also have another helper function that will take those clusters and produce the weighted centers of those N clusters based on some other column

Video of the process is here : https://files.slack.com/files-pri/T024FUQQ8-F1ES9V6BT/download/store_locator_analysis.mov

stuartlynn commented 8 years ago

This should be in place on both the crankshaft and camshaft sides