D-Andreev / recommender-addon

A node addon for recommendations
MIT License
13 stars 2 forks source link
collaborative-filtering node-addon rating recommender-system tf-idf

Recommender

recommender is a node addon with utility functions, which can help when building a recommender system. It contains implementations of tf-idf, Collaborative Filtering and Global Baseline Approach which are commonly used in recommendation systems. Each of the API methods have a sync and an async variation. Using the async methods is highly recommended, because they work in new threads and do not block the event loop.

NPM

Build Status NPM version Open Source Love

Installation

Usage

TF-IDF

The input of TF-IDF is a search query and a collection of documents. It finds how important a word is to a document in a collection. Then using cosine similarity we can get the most similar documents to the search query and make recommendations.

var recommender = require('recommender');

var query = 'get current date time javascript';
var documents = [
    'get the current date and time in javascript',
    'get the current date and time in python',
    'something very different',
    'what is the time now'
];
recommender.tfidf(query, documents, (sortedDocs) => {
    console.log(sortedDocs);
    // Output:
    /**
        get the current date and time in javascript
        get the current date and time in python
        what is the time now
        something very different
    */     
});

The tfidf method also accepts paths to the files where our query and documents are. We can create a text file for the query - search_query.txt with content:

get current date time javascript

And this into documents.txt. Every document is seperated by a new line and every word is separated by space.

get the current date and time in javascript
get the current date and time in python
something very different
what is the time now
var recommender = require('recommender');

var queryPath = './search_query.txt';
var documentsPath = './documents.txt';

recommender.tfidf(queryPath, documentsPath, (sortedDocs) => {
    console.log(sortedDocs);
    // Output:
    /**
        get the current date and time in javascript
        get the current date and time in python
        what is the time now
        something very different
    */   
});

We can also pass filterStopWords which is optional and false by default. If filterStopWords is true those words will be filtered out and not considered when calculating similarity. Stop-words are those words that appear very commonly across the documents, therefore loosing their representativeness and don't contribute to the meaning of the text. i.e (a, about, the, if, it, is...). The full stop words list can be viewed here.

bool filterStopWords = true;
var sortedDocs = recommender.tfidf(queryPath, documentsPath, filterStopWords, calback);

Collaborative filtering

The input for collaborative filtering is a table with user ratings. Consider the following example.

       HP1   HP2   HP3   TW   SW1   SW2   SW3
   A   4     0     0     1     1     0     0
   B   5     5     4     0     0     0     0
   C   0     0     0     2     4     5     0
   D   3     0     0     0     0     0     3

A, B, C and D are users. HP1 (Harry Potter 1), TW (Twilight), SW1 (Star Wars 1) are movies. A rating of 0 means that the user has not rated the movie. In this example ratings range from 1 to 5, but they can be in any system (i.e. 1-10). The predicted rating of user A for HP2, using collaborative filtering is 4. Aside from the ratings table we need to pass the row index and column index for the rating we wish to predict.

var recommender = require('recommender');
var ratings = [
    [ 4, 0, 0, 1, 1, 0, 0 ],
    [ 5, 5, 4, 0, 0, 0, 0 ],
    [ 0, 0, 0, 2, 4, 5, 0 ],
    [ 3, 0, 0, 0, 0, 0, 3 ]
];
var movieIndex = 0;
var userIndex = 4;
// We are predicting the rating of U05 for M1.
var predictedRating = recommender.getRatingPrediction(ratings, movieIndex, userIndex, (predictedRating) => {
    console.log(predictedRating);
// Output: 4
});

There are pros and cons of using only the collaborative filtering method to predict ratings.

Global Baseline Approach

This approach is quite useful when your ratings table is sparse, and there aren't users who rated the same item. Typically with collaborative filtering you would need other users, that rated the same item. Consider the following utility matrix with ratings:

       HP1   HP2   HP3   TW   SW1   SW2   SW3
   A   4     0     0     1     1     0     0
   B   5     5     4     0     0     0     0
   C   0     0     0     2     4     5     0
   D   3     0     0     0     0     0     3

A, B, C and D are users. HP1 (Harry Potter 1), TW (Twilight), SW1 (Star Wars 1) are movies. The predicted rating of user A for HP2, using the global baseline approach is 3.6363636363636362.

var recommender = require('recommender');
var ratings = [
    [ 4, 0, 0, 1, 1, 0, 0 ],
    [ 5, 5, 4, 0, 0, 0, 0 ],
    [ 0, 0, 0, 2, 4, 5, 0 ],
    [ 3, 0, 0, 0, 0, 0, 3 ]
];
var userIndex = 0;
var movieIndex = 1;
// We are predicting the rating of A for HP2.
var predictedRating = recommender.getGlobalBaselineRatingPrediction(ratings, userIndex, movieIndex, (predictedRating) => {
    console.log(predictedRating);
// Output: 3.6363636363636362
});

API

var query = "get current date time javascript"; var documents = [ 'get the current date and time in javascript', 'get the current date and time in python', 'something very different', 'what is the time now' ]; bool filterStopWords = true; recommender.tfidf(query, documents, filterStopWords, (sortedDocs) => { // use sorted docs here.... });

<a name="tfidf-files"></a>
##### recommender.tfidf(`queryFilePath`, `documentsFilePath`, `useStopWords`, [`callback`])
###### Arguments
* `queryFilePath` - A string with the file path to the search query text file. *(Required)*
* `documentsFilePath` - A string with the file path to the documents text file. *(Required)*
* `filterStopWords` - A boolean to filter out the stop words or not. *(Optional)* *(Default: `false`)*
* `callback` - A function with callback. *(Optional)*
###### Returns
An array of strings with the sorted by similarity documents.
```js
[
    'get the current date and time in javascript',
    'get the current date and time in python',
    'what is the time now',
    'something very different'
]
Examples
var recommender = require('recommender');

var queryFilePath = './search_query.txt';
var documentsFilePath = './documents.txt';
bool filterStopWords = true;
var weights = recommender.tfidf(queryFilePath, documentsFilePath, filterStopWords, (sortedDocs) => {
    // use sorted docs here...
});

recommender.getRatingPrediction(ratings, rowIndex, colIndex, [callback])
Arguments

var ratings = [ [ 1, 0, 3, 0, 0, 5, 0, 0, 5, 0, 4, 0 ], [ 0, 0, 5, 4, 0, 0, 4, 0, 0, 2, 1, 3 ], [ 2, 4, 0, 1, 2, 0, 3, 0, 4, 3, 5, 0 ], [ 0, 2, 4, 0, 5, 0, 0, 4, 0, 0, 2, 0 ], [ 0, 0, 4, 3, 4, 2, 0, 0, 0, 0, 2, 5 ], [ 1, 0, 3, 0, 3, 0, 0, 2, 0, 0, 4, 0 ] ]; var rowIndex = 0; var colIndex = 4; recommender.getRatingPrediction(ratings, rowIndex, colIndex, (predictedRating) => { // predictedRating is 3.329569404588411 });

<a name="get-g-b"></a>
##### recommender.getGlobalBaselineRatingPrediction(`ratings`, `rowIndex`, `colIndex`, [`callback`])
###### Arguments
* `ratings` - A two dimensional array with numbers representing the ratings. *(Required)*
* `rowIndex` - An integer with the index of the target row for prediction. *(Required)*
* `colIndex` - An integer with the index of the target column for prediction. *(Required)*
* `callback` - A function with callback. *(optional)*
###### Returns
A float number with the predicted rating.
###### Examples
```js
var recommender = require('recommender');
var ratings = [
    [ 4, 0, 0, 1, 1, 0, 0 ],
    [ 5, 5, 4, 0, 0, 0, 0 ],
    [ 0, 0, 0, 2, 4, 5, 0 ],
    [ 3, 0, 0, 0, 0, 0, 3 ]
];
var userIndex = 0;
var movieIndex = 1;
recommender.getGlobalBaselineRatingPrediction(ratings, userIndex, movieIndex, (predictedRating) => {
    // predictedRating is 3.6363636363636362
});

recommender.getTopCFRecommendations(ratings, rowIndex, [options], [callback])
Arguments

// Or we can pass options parameter. recommender.getTopCFRecommendations(ratings, 0, {limit: 3}, (recommendations) => { console.log(recommendations); / [ { itemId: 1, rating: 4.4907920453550085 }, { itemId: 2, rating: 3.5926336362840074 }, { itemId: 5, rating: 0.5092079546449908 } ] / });

<a name="Run-examples"></a>
### Run examples and benchmarks
- Clone the repo.
- `npm i` in the main folder.
- `npm i` in `/demo` folder.
- `node index.js` to run the examples.
- `node benchmarks.js` to run the benchmarks.

Can be viewed [here](https://github.com/D-Andreev/recommender-addon/blob/master/demo/benchmarks.js). 

tfidf100000: 14471.830ms ratingPrediction100000: 3782.905ms getGlobalBaselineRatingPrediction100000: 3235.675ms getTopCFRecommendations100000: 5171.741ms tfidf100000: 14506.219ms ratingPrediction100000: 3761.865ms getGlobalBaselineRatingPrediction100000: 3279.035ms getTopCFRecommendations100000: 5130.438ms



<a name="Contributing"></a>
### Contributing
Pull requests are welcome.

<a name="Changelog"></a>
### Changelog
For complete changelog [click here](https://github.com/D-Andreev/recommender-addon/blob/master/CHANGELOG.md).

<a name="License"></a>
### License
MIT License

Copyright (c) 2017 Dimitar Andreev

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

[![forthebadge](http://forthebadge.com/images/badges/contains-technical-debt.svg)](http://forthebadge.com) [![forthebadge](http://forthebadge.com/images/badges/check-it-out.svg)](http://forthebadge.com) [![forthebadge](http://forthebadge.com/images/badges/no-ragrets.svg)](http://forthebadge.com) [![forthebadge](http://forthebadge.com/images/badges/winter-is-coming.svg)](http://forthebadge.com) [![forthebadge](http://forthebadge.com/images/badges/powered-by-jeffs-keyboard.svg)](http://forthebadge.com)
[![forthebadge](http://forthebadge.com/images/badges/compatibility-ie-6.svg)](http://forthebadge.com)