actionml / template-scala-parallel-universal-recommendation

30 stars 21 forks source link

Default rank #21

Closed laser13 closed 8 years ago

laser13 commented 8 years ago
  1. For all items, adds in model fields "defaultRank" and "uniqueRank"
  2. "defaultRank" is a user defined feature item, that saved through event server. (maybe needed rename, or user defined in engine.json)
  3. "uniqueRank" as a random float value.
  4. In request the additional sorting is happened in these fields.
  5. What would be the value of these fields fall into response, the query must specify optional field "addRank" = true
pferrel commented 8 years ago

Description of how this works? @alexice or @laser13 ?

pferrel commented 8 years ago

Quick skim: Looks like it adds possibly 2 new rankings, is this true? I think is would be better to add only 1.

The user chooses the field name as they do with "popRank", then the property is chosen as "random" or "userDefined", if random we put a unique-ish random number in the property, if "userDefined" we assume they will with $set.

I believe that every ranking method adds an Elasticsearch performance penalty.

pferrel commented 8 years ago

Is there a test? I guess it's ok to add to the integration test if it's possible

alexice commented 8 years ago

@pferrel The idea is to have all fields and sorting is done in order "popRank", then "userDefined" then "random", because score can be the same for many items, the same situation is possible with popRank, template user can easily add userDefined rank with collisions or just constant, so finally there should be always "random" ranking. In principle "popRank" and "userDefined" should be switched in priority if needed. defaultRank is added to sample-handmade-data.txt

pferrel commented 8 years ago

OK, understood but it would be simpler and more performant to have one fallback ranking that is either userDefined or random.

When in doubt @laser13 or @alexice please ask. I wasn't expecting a finished PR without some discussion. This will avoid misunderstandings in the future.

Thanks @laser13

I don't see how this behavior is modified in engine.json, no example is in the PR.

Here is something like what I'd imagine:

"backfills":[
     {
         "name": "popRank", // defines the property name in Elasticsearch
         "type": "popular",
         "eventNames": ["buy", "view"],
         "duration": "3 days", // note that this has changed from v0.2.3
         "endDate": "ISO8601-date" //most recent date to end the duration
    },{
         "name": "defaultRank" // defines the property name in Elasticsearch
         "type": "random",
   }
]

This turns the single field into a list. We could encode as many fallbacks as we want but 1 is enough because a user-defined $set fallback would be:

"backfills":[
     {
         "name": "popRank", // defines the property name in Elasticsearch
         "type": "popular",
         "eventNames": ["buy", "view"],
         "duration": "3 days", // note that this has changed from v0.2.3
         "endDate": "ISO8601-date" //most recent date to end the duration
    },{
         "name": "defaultRank", // defines the property name in Elasticsearch
         "type": "userDefined",
   }
]

We can use the order of definition 1st = 1, 2nd =2. with recs always the first non-backfill ranking, based on score. This type of param setting could work for 1 or 2 methods of ranking; userDefined or random, or userDefined + random` as long as 2 methods are not forced to be encoded in the Elasticsearch index to get userDefined or random alone.

To get 3 types:

"backfills":[
     {
         "name": "popRank", // defines the property name in Elasticsearch
         "type": "popular",
         "eventNames": ["buy", "view"],
         "duration": "3 days", // note that this has changed from v0.2.3
         "endDate": "ISO8601-date" //most recent date to end the duration
    },{
         "name": "userDefined", // defines the property name in Elasticsearch
         "type": "userDefined",
    },{
         "name": "defaultRank",  // defines the property name in Elasticsearch
         "type": "random",
   }
]

This makes the order important as it is in eventNames or the new one coming for downsampling, which will be also an array/list of "indicators".

How about this type of encoding? If @laser13 can do it, go ahead, otherwise let me know and I'll add the json parsing.

laser13 commented 8 years ago

@pferrel I like your suggestion. I think it would be correct. I will make it.

laser13 commented 8 years ago

@pferrel I'm sorry, this is not a new PR, I just deleted a line in the code. New PR will likely tomorrow.