actionml / universal-recommender

Highly configurable recommender based on PredictionIO and Mahout's Correlated Cross-Occurrence algorithm
http://actionml.com/universal-recommender
Apache License 2.0
667 stars 173 forks source link

Feature Request: Return Expanded Array #28

Open codyckimball opened 7 years ago

codyckimball commented 7 years ago

The hope is to create the capability to easily configure the recommendation engine to serve up not only the TargetEntityID and score, as exists currently, but also for users to possibly pass an array of values that would map to existing property list items

example: extraParams = ['title', 'description','image']

Results after: { "itemScores":[ {"item":"22","score":4.072304374729956, "title":"title1", "description":"helpful meta description1", "image":"imageurl1"}, {"item":"62","score":4.058482414005789, "title":"title2", "description":"helpful meta description2", "image":"imageurl2"}, {"item":"75","score":4.046063009943821, "title":"title3", "description":"helpful meta description3", "image":"imageurl3"}, {"item":"68","score":3.8153661512945325, "title":"title4", "description":"helpful meta description4", "image":"imageurl4"} ] }

pferrel commented 7 years ago

We already allow properties in $set events that are stored with the model in Elasticsearch so nothing is needed there. See Property Change Events here: http://actionml.com/docs/ur_input

What is needed is an array of "resultProps" in the query for return with each result. So these should be sent in the query but "set" with a $set event.

An optional way is to provide a default set of properties by name in the engine.json, something like:

"resultProps": ["prop1", "prop2", ...]

This could be provided in the query and/or engine.json. It means to not strip those but return them for each result item

Notice the restriction is that properties are textual, they MUST be encoded in the results as JSON arrays of strings, even if there is only one value.

codyckimball commented 7 years ago

I had assumed this is where the resultProps could be identified in the engine, but even after building, training, and deploying not seeing any new fields returned in the response though other changes to engine.json are coming through e.g. change the num parameter in algorithms.

Engine.json:

"algorithms": [
    {
      "comment": "simplest setup where all values are default, popularity based backfill, must add eventsNames",
      "name": "ur",
      "params": {
        "appName": "appName",
        "indexName": "urindex",
        "typeName": "items",
        "comment": "must have data for the first event or the model will not build, other events are optional",
        "resultProps": ["mDescription","mTitle","mImage"],
        "indicators": [
          {

I also tried adding it in the datasource params object, but no luck. Is there any documentation on using resultProps, or does http://actionml.com/docs/ur_config just need to be updated to account for this parameter?

pferrel commented 7 years ago

sure that looks fine.

hits.source is where you will find all properties known by ES. As you can see not all are being returned. https://github.com/actionml/universal-recommender/blob/master/src/main/scala/URAlgorithm.scala#L484

codyckimball commented 7 years ago

I know typically it is a bad rule of thumb to edit an API class from a source library such as this, but do you have any best practice recommendations against editing the ItemScore() api class (https://github.com/actionml/universal-recommender/blob/master/src/main/scala/Engine.scala#L71) to include 3 more optional parameters?

case class ItemScore(
  item: ItemID, // item id
  score: Double, // used to rank, original score returned from teh search engine
  ranks: Option[Map[String, Double]] = None,
  mDescription: Option[String],
  mTitle: Option[String],
  mImage: Option[String]) extends Serializable

If not then it should be easy to pass the 3 set properties I do see in the hit.getSource Map, and should be good to go

pferrel commented 7 years ago

This would not be accepted since the values are hard coded.

To make this general you should read the property names from the params in engine.json. These are passed in to URAlgorithm.predict and other methods.

Maybe add itemAttributes: Option[Map[String, Seq[String]] in Engine.scala and fill in the right values in URAlgorithm by reading the engine.json params. Encoding the attribute as a Seq[String] accounts for properties which are JSON named arrays of strings, even if containing only one string. In the Map the first String is the property name you set in engine.json, the Seq[String] are the values gotten from the _source part of the result that map to the property name.

Haven't had time to look deeply so hope I'm not misleading. Take these as ideas, not code that will definitely work.