DataSystemsLab / recdb-postgresql

RecDB is a recommendation engine built entirely inside PostgreSQL
363 stars 43 forks source link
database machine-learning recommendation recommendation-algorithms recommendation-engine recommendation-system recommender-system

# Welcome to RecDB RecDB is an Open Source Recommendation Engine Built Entirely Inside PostgreSQL 9.2. RecDB allows application developers to build recommendation applications in a heartbeat through a wide variety of built-in recommendation algorithms like user-user collaborative filtering, item-item collaborative filtering, singular value decomposition. Applications powered by RecDB can produce online and flexible personalized recommendations to end-users.

RecDB Logo current version: v1.1

How to Get Source Code

You can check out the code, as follows:

$ git clone https://github.com/Sarwat/recdb-postgresql.git

Recommended Machine Specifications

RecDB is designed to be run on a Unix operating system. At least 1GB of RAM is recommended for most queries, though when working with very large data sets more RAM may be desirable, especially when you are not working with apriori (materialized) recommenders.

Building and Installation

Once you've synced with GitHub, the folder should contain the source code for PostgreSQL, as well as several Perl scripts in a directory named "./PostgreSQL/scripts/". If you are familiar with installing PostgreSQL from source, RecDB is installed the exact same way; for the sake of simplicity, however, we have included these scripts that will simplify the process. Note that the installation and remake-related scripts MUST be run from within the PostgreSQL folder in order for them to work correctly.

1) Run the installation script install.pl.

perl scripts/install.pl [abs_path]

[abs_path] is the absolute path to the directory where you want PostgreSQL installed. The directory should exist before running this script. This will also create a folder "data" in the PostgreSQL folder; this is where the database will be located.

2) Run the database server script pgbackend.pl.

perl scripts/pgbackend.pl

The install.pl script stores the install path in a separate file, so there shouldn't be any need to specify it.

3) In a second terminal, run the database interaction script pgfrontend.pl.

perl scripts/pgfrontend.pl [db_name] [server_host]

[db_name] is the name of the database that you intend to use. [server_host] is the address of the host server running the PostgreSQL backend. If this option is not specified, the script assumes it to be "localhost".


If you need to rebuild PostgreSQL, there are two options.

If you have not modified the grammar, you can do a quick rebuild with remake.pl.

perl scripts/remake.pl

If you have modified the grammar, you will need to do a longer rebuild with remakefull.pl.

perl scripts/remakefull.pl [abs_path]

[abs_path] is the absolute path to the directory where you want PostgreSQL installed. The directory should exist before running this script.

If you ever want to eliminate the current database , use the clean.pl script.

perl scripts/clean.pl [db_name] [server_host]

How RecDB Works

Loading Data

We provide the MovieLens data to build a "Hello-World" movie recommendation application using RecDB. You can load the data using the sql script called "initmovielens1mdatabase.sql" stored in "./PostgreSQL" directory. We provide the dataset at "./PostgreSQL/moviedata / MovieLens1M/" directory. For instance, the ratings (i.e., ml_ratings) table may have a schema as follows:

+-----------------------------+
| userid | itemid | ratingval |
+-----------------------------+

Creating Recommenders

Users may create recommenders apriori so that when a recommendation query is issued may be answer with less latency. The user needs to specify the ratings table in the ON clause and also specify where the user, item, and rating value columns are in that table. Moreover, the user has to designate the recommendation algorithm to be used to predict item ratings in the USING clause.

CREATE RECOMMENDER MovieRec ON ml_ratings
USERS FROM userid
ITEMS FROM itemid
EVENTS FROM ratingval
USING ItemCosCF

Currently, the available recommendation algorithms that could be passed to the USING clause are the following:

Similarly, materialized recommenders can be removed with the following command:

DROP RECOMMENDER MovieRec

Note that if you query a materialized recommender, the three columns listed above will be the only ones returned, and attempting to reference any additional columns will result in an error.

Recommendation Query

In the recommendation query, the user needs to specify the ratings table and also specify where the user, item, and rating value columns are in that table. Moreover, the user has to designate the recommendation algorithm to be used to predict item ratings. For example, if ml_ratings(userid,itemid,ratingval) represents the ratings table in a movie recommendation application, then to recommend top-10 movies based on the rating predicted using Item-Item Collaborative filtering (applying cosine similarity measure) algorithm to user 1, the user writes the following SQL:

SELECT * FROM ml_ratings R
RECOMMEND R.itemid TO R.userid ON R.ratingval USING ItemCosCF
WHERE R.userid = 1
ORDER BY R.ratingval
LIMIT 10

When you issue a query such as this, the only interesting data will come from the three columns specified in the RECOMMEND clause. Any other columns that exist in the specified ratings tables will be set to 0.

Note that if you do not specify which user(s) you want recommendations for, it will generate recommendations for all users, which can take an extremely long time to finish.

More Complex Queries

The main benefit of implementing the recommendation functionality inside a database engine (PostgreSQL) is to allow for integration with traditional database operations, e.g., selection, projection, join. For example, the following query recommends the top 10 Comedy movies to user 1. In order to do that, the query joins the recommendation with the Movies table and apply a filter on the movies genre column (genre LIKE '%Comedy%').

SELECT * FROM ml_ratings R, Movies M
RECOMMEND R.itemid TO R.userid ON R.ratingval USING ItemCosCF
WHERE R.userid = 1 AND M.movieid = R.itemid AND M.genre LIKE '%Comedy%'
ORDER BY R.ratingval
LIMIT 10

Publications

Authors

Support or Contact

Having trouble with RecDB? contact msarwat@asu.edu and we’ll help you sort it out.

Follow @Rec_DB on Twitter for updates