The Universal Recommender (UR) is a new type of collaborative filtering recommender based on an algorithm that can use data from a wide variety of user preference indicators—it is called the Correlated Cross-Occurrence algorithm. Unlike matrix factorization embodied in things like MLlib's ALS, CCO is able to ingest any number of user actions, events, profile data, and contextual information. It then serves results in a fast and scalable way. It also supports item properties for building flexible business rules for filtering and boosting recommendations and can therefor be considered a hybrid collaborative filtering and content-based recommender.
Most recommenders can only use conversion events, like buy or rate. Using all we know about a user and their context allows us to much better predict their preferences.
The UR 0.8.0+ requires the Harness Machine Learning Server. 0.7.3 and before run in PredictionIO 0.12.1. This repo is now build into Harness as a pre-packaged Engine. See Upgrading from PIO to Harness
All docs for the Universal Recommender are here and are hosted at https://github.com/actionml/docs.actionml.com. If you wish to change or edit the docs make a PR to that repo.
Contributions are encouraged and appreciated. Create a push request (PR) against the develop
branch of the git repo. We like to keep new features general so users will not be required to change the code of the UR to make use of the new feature. We will be happy to provide guidance or help via the GitHub PR review mechanism.
The Universal Recommender has moved. Future versions will be included as a built-in Engine for the new Harness Machine Learning Server. the UR v0.8.0+ is data compatible with previous versions that are integrated with PredictionIO. This means you can export from UR+PIO and import into UR+Harness. See Upgrading from PIO to Harness
Adds:
Adds:
python3
wherever python is invoked. Before this branch it was assumed that the environment mapped python
to python3
which is required for PIO 0.12+ and the UR 0.7+. Since many distros have python
invoke python 2.7 and python3
is needed to invoke python 3.6 we now do also.Adds:
"from": 0, "num": 2
will return 2 recs from the first available, "from": 2, "num": 2
will return 2 starting at the 3rd since "from"
is 0 based.This tag take precedence over 0.7.0, which should not be used. Changes:
This README Has Special Build Instructions!
This tag is for the UR integrated with PredictionIO 0.12.0 using Scala 2.11, Spark 2.1.x, and most importantly Elasticsearch 5.x. Primary differences from 0.6.0:
Fixed a bug in exclusion rules based on item properties
WARNING: Upgrading Elasticsearch or HBase will wipe existing data if any, so follow the special instructions below before installing any service upgrades.
You must build PredictionIO with the default parameters so just run ./make-distribution
this will require you to install Scala 2.11 and Python 3 (as the default Scala and Python). You can also run up to Spark 2.1.x (but not 2.2.x), ES 5.5.2 or greater (but 6.x has not been tested), Hadoop 2.6 or greater, you can get away with using older versions of services except ES must be 5.x. If you have issues getting pio to build and run send questions to the PIO mailing list.
Backup your data, moving from ES 1 to ES 5 will delete all data!!!! Actually even worse it is still in HBase but you can’t get at it so to upgrade do the following:
pio export
with pio < 0.12.0 =====Before upgrade!=====pio data-delete
all your old apps =====Before upgrade!=====pio app new …
and pio import …
any needed datasetsOnce PIO is running test with pio status
and pio app list
. To test your setup and UR integration, run ./examples/integration-test
from the URs home.
a sample of pio-env.sh that works with one type of setup is below, but you'll have to change paths to match yours. This example show the new way to configure for Elasticsearch 5.x, which uses a new port number:
#!/usr/bin/env bash
# SPARK_HOME: Apache Spark is a hard dependency and must be configured.
# using Spark 2.2.1 here
SPARK_HOME=/usr/local/spark
# ES_CONF_DIR: You must configure this if you have advanced configuration for
# using ES 5.6.3
ES_CONF_DIR=/usr/local/elasticsearch/config
# HADOOP_CONF_DIR: You must configure this if you intend to run PredictionIO
# using hadoop 2.8 here
HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
# HBASE_CONF_DIR: You must configure this if you intend to run PredictionIO
# using HBase 1.2.x here or whatever the highest numbered stable release is
HBASE_CONF_DIR=/usr/local/hbase/conf
# Filesystem paths where PredictionIO uses as block storage.
PIO_FS_BASEDIR=$HOME/.pio_store
PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines
PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp
# Storage Repositories
PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH
PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_
PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS
PIO_STORAGE_REPOSITORIES_APPDATA_NAME=pio_appdata
PIO_STORAGE_REPOSITORIES_APPDATA_SOURCE=ELASTICSEARCH
PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_eventdata
PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE
# ES config
PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200 # <===== notice 9200 now
PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=elasticsearch_xyz # <===== should match what you have in you ES config file
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=/usr/local/elasticsearch
PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
PIO_STORAGE_SOURCES_LOCALFS_HOSTS=$PIO_FS_BASEDIR/models
PIO_STORAGE_SOURCES_HBASE_TYPE=hbase
PIO_STORAGE_SOURCES_HBASE_HOME=/usr/local/hbase
Mahout has speedups for the Universal Recommender's use that have not been released yet so you will have to build from source. To make this easy we have a fork hosted here, with special build instructions. Make sure you are on the "sparse-speedup" branch and follow instructions in the README.md
0.7.0
tag.resolvers += "Local Repository" at "file:///Users/pat/.custom-scala-m2/repo”
with your path to the local mahout build. the UR will not build unless this line is changed, this is expectedpio build
or run the integration test to get sample data put into PIO ./examples/integration-test
This is a major upgrade release with several new features. Backward compatibility with 0.5.0 is maintained. Note: We no longer have a default engine.json
file so you will need to copy engine.json.template
to engine.json
and edit it to fit your data. See the Universal Recommender Configuration docs.
minEventsPerUser
in the UR configuration docs.blackListEvents
as defined in engine.json
was not working for an empty list, which should and now does disable any blacklisting except explicit item blacklists contained in the query.pio build
failure triggered by the release of Apache PIO. If you have problems building v0.4.0 use this version. It is meant to be used with PredictionIO-0.9.7-aml.indicators
parameter.rankings
parameter.SelfCleanedDataSource
trait. Adding params to the DataSource
part of engine.json
allows control of de-duplication, property event compaction, and a time window of event. The time window is used to age out the oldest events. Note: this only works with the ActionML fork of PredictionIO found in the repo mentioned above.backfillField: duration
to accept Scala Duration strings. This will require changes to all engine.json files that were using the older # of seconds duration.typeName
in engine.json is required be "items"
, with this release the type can be any string.pio train
time is taken up by writing to Elasticsearch. This can be optimized by creating and ES cluster or giving ES lots of memory.pio deploy
to make the new model active.This Software is licensed under the Apache Software Foundation version 2 license found here: http://www.apache.org/licenses/LICENSE-2.0