Open lecardozo opened 1 year ago
@lecardozo you can check out the Generate top-K recommendations
section in this example nb showcasing how to generate topK recommendations for a given batch. and you can loop over the batches and then concat the outputs.
Thanks for the answer @rnyak!
Sorry, I think I wasn't clear before. I'm looking specifically for a way of generating embeddings for query/candidates independently, instead of generating recommendations. The idea is to have candidate embeddings indexed on an external vector search engine and use ANN for retrieval later.
@lecardozo the same notebook shows how to generate candidate and query embeddings.
queries = model.query_embeddings(Dataset(user_features, schema=schema.select_by_tag(Tags.USER)),
batch_size=1024, index=Tags.USER_ID)
query_embs_df = queries.compute(scheduler="synchronous").reset_index()
item_features = (
unique_rows_by_features(train, Tags.ITEM, Tags.ITEM_ID).compute().reset_index(drop=True)
)
item_embs = model.candidate_embeddings(Dataset(item_features, schema=schema.select_by_tag(Tags.ITEM)),
batch_size=1024, index=Tags.ITEM_ID)
hope that helps.
That was my first try, as I followed along the whole notebook. As these methods are just thin wrappers around the Encoder.encode()
, we end up having the same performance issues that I mentioned befored (which is what made me look at the source code of these methods in the first place).
❓ Questions & Help
What is the preferred way of generating predictions from a trained
Encoder
from aTwoTowerModelV2
? There seem to be at least two ways of doing that, with apparently huge performance differences.Details
After training a TwoTowerModelV2 I noticed that there is a huge difference in performance between calling the
model.query_encoder.encode()
method of each tower versus calling it directlymodel.query_encoder()
on a single node with CPU.Setup
Calling
encode()
This takes more >1 hour on 434457 rows. Resource usage metrics show that the CPU is idle most of the time, which is quite unexpected.
Tried increasing the number of partitions of the transformed dataset and set the
.compute(scheduler='processes')
to benefit from Dask's parallelization, but it didn't work (failed with serialization issues)Calling
__call__()
withLoader
This takes ~30 seconds on 434457 rows. As my data fits into memory, this ended up being a clear winner.
Is this difference expected or am I doing something wrong?