Feature sim clustering: Unused data_type job param in Loopy query?

lukewendling commented 7 years ago

See https://github.com/Sotera/watchman/blob/master/services/feature-similarity-clustering/main.py#L38

Sanity check: should we be using job['data_type'] in the Loopy query params?

I noticed during a system run that the clustering step was taking a really long time. It is scrolling thru more records that I'd expect (more than the featurizer step) b/c it doesn't filter by data_type.

What am i missing? @drJAGartner @justinlueders

drJAGartner commented 7 years ago

It certainly should be only getting smp of a single type

On Thursday, December 22, 2016, Luke Wendling notifications@github.com wrote:

See https://github.com/Sotera/watchman/blob/master/services/ feature-similarity-clustering/main.py#L38

Sanity check: should we be using job['data_type'] in the Loopy query params?

I noticed during a system run that the clustering step was taking a really long time. It is scrolling thru more records that I'd expect (more than the featurizer step) b/c it doesn't filter by data_type.

What am i missing? @drJAGartner https://github.com/drJAGartner @justinlueders https://github.com/justinlueders

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Sotera/watchman/issues/75, or mute the thread https://github.com/notifications/unsubscribe-auth/AGzI79z8EIr5Oj7d_6BldiDG_ZtEZsppks5rKz6fgaJpZM4LUjPM .

lukewendling commented 7 years ago

Fixed by f8dcc7f

Sotera / watchman

Feature sim clustering: Unused data_type job param in Loopy query? #75