dssg / givinggraph

An API tool to help understand the relationships between non-profits, for-profits, and the causes they support.
https://github.com/dssg/givinggraph/wiki/API
MIT License
28 stars 13 forks source link

Auto-tune similarity scores #21

Open aronwc opened 10 years ago

aronwc commented 10 years ago

We compute similarity based on several things: missions statement overlap, twitter overlap, etc. It may be worth merging these into a single score. Here are two ideas:

  1. The simple way: take a weighted average of the scores, where weights are either uniform or tuned by hand based on a totally subjective analysis of the results
  2. The more complex way: Tune the weights automatically using known NTEE codes or causes. This can be mapped to the following classification problem:
    • For each pair of nonprofits, create a classification instance where the label is 1 if they have the same cause or NTEE code, 0 otherwise
    • Train a classifier on this data, using the scores from each source as features.
    • The resulting weights should tell us "how important is this score for reproducing cause/ntee classifications"
    • Of course, we dont want to reproduce NTEE codes exactly, but this may nudge the weights in the right direction