GoogleCloudPlatform / analytics-componentized-patterns

Apache License 2.0
174 stars 98 forks source link

Add a code sample for item-item recommendation using BigQuery ML matrix factorization and ScaNN #7

Closed ksalama closed 3 years ago

ksalama commented 3 years ago

This PR contains a sample code for training and serving embeddings for real-time similarity matching. The system utilizes BigQuery ML Matrix Factorization model to train the embeddings, and the open-source ScaNN framework to build and approximate nearest neighbour index.

  1. Compute pointwise mutual information (PMI) between items based on their cooccurrences.
  2. Train item embeddings using BigQuery ML Matrix Factorization, using item PMI as implicit feedback.
  3. Export and post-process the embeddings from BigQuery ML model to Cloud Storage as CSV files using Cloud Dataflow.
  4. Implement an embedding lookup model using Keras and deploys it to AI Platform Prediction.
  5. Serve the embedding as an approximate nearest neighbor index using ScaNN
ksalama commented 3 years ago

@polong-lin - Kindly review the PR