ankane / disco

Recommendations for Ruby and Rails using collaborative filtering
MIT License
565 stars 11 forks source link
recommendation-engine recommender-system

Disco

:fire: Recommendations for Ruby and Rails using collaborative filtering

Build Status

Installation

Add this line to your application’s Gemfile:

gem "disco"

Getting Started

Create a recommender

recommender = Disco::Recommender.new

If users rate items directly, this is known as explicit feedback. Fit the recommender with:

recommender.fit([
  {user_id: 1, item_id: 1, rating: 5},
  {user_id: 2, item_id: 1, rating: 3}
])

IDs can be integers, strings, or any other data type

If users don’t rate items directly (for instance, they’re purchasing items or reading posts), this is known as implicit feedback. Leave out the rating.

recommender.fit([
  {user_id: 1, item_id: 1},
  {user_id: 2, item_id: 1}
])

Each user_id/item_id combination should only appear once

Get user-based recommendations - “users like you also liked”

recommender.user_recs(user_id)

Get item-based recommendations - “users who liked this item also liked”

recommender.item_recs(item_id)

Use the count option to specify the number of recommendations (default is 5)

recommender.user_recs(user_id, count: 3)

Get predicted ratings for specific users and items

recommender.predict([{user_id: 1, item_id: 2}, {user_id: 2, item_id: 4}])

Get similar users

recommender.similar_users(user_id)

Examples

MovieLens

Load the data

data = Disco.load_movielens

Create a recommender and get similar movies

recommender = Disco::Recommender.new(factors: 20)
recommender.fit(data)
recommender.item_recs("Star Wars (1977)")

Ahoy

Ahoy is a great source for implicit feedback

views = Ahoy::Event.where(name: "Viewed post").group(:user_id).group_prop(:post_id).count

data =
  views.map do |(user_id, post_id), _|
    {
      user_id: user_id,
      item_id: post_id
    }
  end

Create a recommender and get recommended posts for a user

recommender = Disco::Recommender.new
recommender.fit(data)
recommender.user_recs(current_user.id)

Storing Recommendations

Disco makes it easy to store recommendations in Rails.

rails generate disco:recommendation
rails db:migrate

For user-based recommendations, use:

class User < ApplicationRecord
  has_recommended :products
end

Change :products to match the model you’re recommending

Save recommendations

User.find_each do |user|
  recs = recommender.user_recs(user.id)
  user.update_recommended_products(recs)
end

Get recommendations

user.recommended_products

For item-based recommendations, use:

class Product < ApplicationRecord
  has_recommended :products
end

Specify multiple types of recommendations for a model with:

class User < ApplicationRecord
  has_recommended :products
  has_recommended :products_v2, class_name: "Product"
end

And use the appropriate methods:

user.update_recommended_products_v2(recs)
user.recommended_products_v2

Storing Recommenders

If you’d prefer to perform recommendations on-the-fly, store the recommender

json = recommender.to_json
File.write("recommender.json", json)

The serialized recommender includes user activity from the training data (to avoid recommending previously rated items), so be sure to protect it. You can save it to a file, database, or any other storage system, or use a tool like Trove. Also, user and item IDs should be integers or strings for this.

Load a recommender

json = File.read("recommender.json")
recommender = Disco::Recommender.load_json(json)

Alternatively, you can store only the factors and use a library like Neighbor. See the examples.

Algorithms

Disco uses high-performance matrix factorization.

Specify the number of factors and epochs

Disco::Recommender.new(factors: 8, epochs: 20)

If recommendations look off, trying changing factors. The default is 8, but 3 could be good for some applications and 300 good for others.

Validation

Pass a validation set with:

recommender.fit(data, validation_set: validation_set)

Cold Start

Collaborative filtering suffers from the cold start problem. It’s unable to make good recommendations without data on a user or item, which is problematic for new users and items.

recommender.user_recs(new_user_id) # returns empty array

There are a number of ways to deal with this, but here are some common ones:

Get top items with:

recommender = Disco::Recommender.new(top_items: true)
recommender.fit(data)
recommender.top_items

This uses Wilson score for explicit feedback and item frequency for implicit feedback.

Data

Data can be an array of hashes

[{user_id: 1, item_id: 1, rating: 5}, {user_id: 2, item_id: 1, rating: 3}]

Or a Rover data frame

Rover.read_csv("ratings.csv")

Or a Daru data frame

Daru::DataFrame.from_csv("ratings.csv")

Performance

If you have a large number of users or items, you can use an approximate nearest neighbors library like Faiss to improve the performance of certain methods.

Add this line to your application’s Gemfile:

gem "faiss"

Speed up the user_recs method with:

recommender.optimize_user_recs

Speed up the item_recs method with:

recommender.optimize_item_recs

Speed up the similar_users method with:

recommender.optimize_similar_users

This should be called after fitting or loading the recommender.

Reference

Get ids

recommender.user_ids
recommender.item_ids

Get the global mean

recommender.global_mean

Get factors

recommender.user_factors
recommender.item_factors

Get factors for specific users and items

recommender.user_factors(user_id)
recommender.item_factors(item_id)

Credits

Thanks to:

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/disco.git
cd disco
bundle install
bundle exec rake test