amzn / pecos

PECOS - Prediction for Enormous and Correlated Spaces
https://libpecos.org/
Apache License 2.0
517 stars 105 forks source link

Implement C/C++ PairwiseANN w/ Python API #268

Closed OctoberChang closed 11 months ago

OctoberChang commented 11 months ago

Issue #, if available:

Description of changes: Implement C/C++ PairwiseANN function with Python API.

Given a (input, label) pair, PairwiseANN finds top-K nearest queries in the indexed input-to-label graph. The searcher return four arrays of size K:

Indexing usage

train_params = PairwiseANN.TrainParams(metric_type="ip")
model = PairwiseANN.train(X_trn, Y_csr, train_params=train_params)

where

Save/load Usage

model.save(model_folder)
del model
model = PairwiseANN.load(model_folder)

Searchers Usage

searchers = model.searchers_create(max_batch_size=100, max_only_topk=10, num_searcher=1)

where

Prediction Usage With same input (suitable for real-time inference):

It, Mt, Dt, Vt = model.predict(query_vec, label_keys, searchers, pred_params=pred_params, is_same_input=True)

With different inputs (suitable for batch prediction):

It, Mt, Dt, Vt = model.predict(Query_mat, label_keys, searchers, pred_params=pred_params, is_same_input=False)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.