cornelltech / company-projects-matcher

The matching algorithm for Company Projects: CS 5999.
4 stars 5 forks source link

Fix distance calculation between 12-vectors. #1

Open ameyaacharya opened 10 years ago

ameyaacharya commented 10 years ago

Currently, the calculation between our 12-vectors in do_mahal_distance in covariance.py is incorrect. When we return the sorted list of pairwise distances, there are some "dissimilar" vectors that are supposedly "more similar than" or "the same as" a vector against itself. For example: [3 1 0 4] [3 1 0 4] 79512674.1057 [3 1 0 4] [0 3 0 4] 79512674.1057.

ameyaacharya commented 10 years ago

In b3de91, most of the output makes sense (vectors dotted with itself will have 0 Mahalanobis distance, which makes sense). But, we still have anomalies like this:

[1 1 1 3] [1 0 0 4] Diff at 1 Diff at 2 Diff at 3 1.97020835811

[1 0 0 4] [1 2 0 4] Diff at 1 1.97563556343

The first pair of vectors is different in 3 positions, but they are "closer" than the second pair of vectors because the second pair of vectors has a "distance" of 2 in the second position.

Is this what we want? Maybe.

ameyaacharya commented 10 years ago

Actually, this is a better example of what's wrong:

[1 1 1 3] [1 0 0 4] Diff at 1 Diff at 2 Diff at 3 1.97020835811

[1 1 1 3] [1 0 0 3] Diff at 1 Diff at 2 1.98913558085

Why is the second pair of vectors farther than the first one?

ameyaacharya commented 10 years ago

This is dependent on #13.

ameyaacharya commented 10 years ago

Note: use python distance for now.