Weights - Githubissues

eriq-augustine commented 7 years ago

Learn some feature weights.

eriq-augustine commented 7 years ago

@dhawaljoh These initial finding may be interesting to you.

A rough weight learning run with just the subset of the ground truth gives these weights: STARS = 1.5 TOTAL_REVIEW_COUNT = 0.0 AVAILABLE_REVIEW_COUNT = 1.0 MEAN_REVIEW_LEN = 2.0 MEAN_WORD_LEN = 2.0 NUM_WORDS = 0.0 MEAN_WORD_COUNT = 0.0 TOTAL_HOURS = 0.5 ATTRIBUTES = 2.0 CATEGORIES = 2.0 TOP_WORDS = 1.0 KEY_WORDS = 1.0 OPEN_HOURS = 0.0

Rand Index = 0.966339

Recall that the weight is just multiplied to the normalized distance score, so high does not necessarily mean important.

eriq-augustine commented 7 years ago

Maybe we need non-linear weights. Maybe just inverse the weights.

It just seems a little strange since 0 means don't use the feature; but the closer to 0 the distance is, the more similar they are.

eriq-augustine commented 7 years ago

@dhawaljoh

Oh shit, inverse weights to much better. 1.0 0.0 2.0 0.0 0.5 0.0 0.0 0.0 1.0 0.5 1.0 1.5 0.0 0.995792

STARS = 1.0 TOTAL_REVIEW_COUNT = 0.0 AVAILABLE_REVIEW_COUNT = 2.0 MEAN_REVIEW_LEN = 0.0 MEAN_WORD_LEN = 0.5 NUM_WORDS = 0.0 MEAN_WORD_COUNT = 0.0 TOTAL_HOURS = 0.0 ATTRIBUTES = 1.0 CATEGORIES = 0.5 TOP_WORDS = 1.0 KEY_WORDS = 1.5 OPEN_HOURS = 0.0

Rand Index = 0.995792

Keep in mind that these numbers are "weights" which now means (1/w * distance) is what is computed. Also keep in mind that this is just on a small arbitrary (but not random) subset of the ground truth.

eriq-augustine / 242-2016

Weights #1