PredictionIO / template-scala-parallel-universal-recommendation

PredictiionIO Template for Universal Recommender
111 stars 48 forks source link

About user profile attributes #44

Open softwareklinic opened 8 years ago

softwareklinic commented 8 years ago

Let us say we have users whose profile might have attributes such as DeviceName, Brandname, PhonePlan, Tenure, Birthdate etc.. and we have ITEMS that could be URLs (links on the page) -- with properties such as Category (e.g. phone, tablets, etc...)...

How do you propose we construct our data ingestion and engine.json... Can the data be as below e.g.

10000001, view, http://www.xxxxxx.com http://www.xxxxxxxx.com, $set, Categories: phone 10000001, $set, Tenure: 3 years 10000001, $set, DeviceName: iPhone 10000001, $set, DeviceBrandName: Apple

Will the UR Algorithm consider the $set attributes we set using $set in the recommendation?

softwareklinic commented 8 years ago

OR

Can we do something like

10000001, view, http://www.xxxxxxxx.com 10000001, Tenure, 3+ years 10000001, DeviceName, iPhone 10000001, DeviceBrandName, Apple http://www.xxxxxxxx.com, Category, Phones

Is this a valid way to formulate events... or these are not the attributes???

pferrel commented 8 years ago

Check the readme.md for formulation of "usage events" which are indicators of user taste like purchases, likes, pageviews, category-preference, etc. And $set events, which are used to attach properties to items.

softwareklinic commented 8 years ago

Thank you - so my question is -- even if though i have user profile attributes such as tenure, age --- do i need to attach those to Item properties --- is that fair??? How would the engine use the user profile attributes in decision making....

Maybe I'm still confused... Can i attach sample of my Engine.json and data text file??

softwareklinic commented 8 years ago

Someone @ google groups posted similar question -- here is the snippet... hope this helps on my question as well..

Hello,

Here is my problem :

A user as multiple attributes ( others than id ) such as :

Of course prediction can give me predictions for this user but i'd like to find which items other users could be interested in considering their age, country etc?

Does prediction.io makes the trick ? I'm not really sure.

softwareklinic commented 8 years ago

The fields collection --- is it only for items or even users? What if I have user with properties - could I pass those as well in fields during query... or is the algorithm written for item properties only...

“user”: “xyz”, “fields”: [ { “name”: “categories” “values”: [“series”, “mini-series”], “bias”: -1 // filter out all except ‘series’ or ‘mini-series’ },{ “name”: “genre”, “values”: [“sci-fi”, “detective”] “bias”: 1.02 // boost/favor recommendations with the genre’ =sci-fi` or ‘detective’ } ]

softwareklinic commented 8 years ago

ANOTHER TRY ----

We have data for both ITEMs and USERs as below...

Items - are URLs viewed Users - have attributes such as tenure, birthyear, brandname, devicename, plan

If we ingest events such as below

u1, view, http://www.url1.com u2, view, http://www.url1.com u3, view, http://www.url2.com http://www.url1.com, $set, categories:billing http://www.url2.com, $set, categories:payment u1, $set, tenure:3+years u1, $set, birthyear, 1972 u2, $set, tenure:1+year u2, $set, birthyear: 1980

Assume we ingest 100s of such combinations...

If we ingest data this way -- if I want to recommend top 3 urls for any given user... does the UNIVERSAL RECOMMENDER template consider the user attributes... during the decision making... or does it just consider the item attributes only...

We didnt see much impact of the user attributes when trying out this way... so we want and used another alternative.... means --- marking each attribute as a event by itself as below... which seems to be working... but i want to understand how does the template use user properties/attributes...

u1, view, http://www.url1.com u2, view, http://www.url1.com u3, view, http://www.url2.com u1, tenure, 3+years u1, birthdyear, 1972 u1, tenure, 1+year u2, birthyear, 1980 http://www.url1.com, $set, categories:billing http://www.url2.com, $set, categories:payment

pferrel commented 8 years ago

This is not a good place for this discussion. The Google group is better since this is not the root repo for this engine, it is here: https://github.com/actionml/template-scala-parallel-universal-recommendation, and be aware that the latest version only runs on PredictionIO from this repo: https://github.com/actionml/PredictionIO

Fields are used in queries, properties are only attached to items, and user information, which may or may not be from profiles is always encoded as "usage events" so you never set user properties in the data. We may change this later to account for non-changing user properties but or now use "usage events".

The way you encode user data is to create an event like (user-id, "tenure", "3+years"), this may or may not change over time. Location may depend on where a user logs in and may not be profile data at all so the more general way to input user data is as events--even if they only occur once like when they fill out their "gender". You can send up to 50 events at once but each "usage event" can contain only one event name and one item-id. In example the event name is "tenure" and the item-id is "3+years".

The other thing to note is that properties are always arrays of strings (except for dates), even if they contain only one value. so ("http://www.url1.com", $set, "categories":["billing"]) would be pseudo-code for the json $set event.

So you are pretty close for how the events and properties are input. As the README.md says, you should first choose your "primary event" and it should be named first in engine.json eventNames array. This event will be input with an item-id of the type to be recommended. In you case it may be a url and a "view" may be your primary event. So you are telling the Universal Recommender that you want to recommend URLs based on "views" and any other user information that correlates to users viewing a URL. So yes all the user information input will be used, but only events that pass the algorithm's test of correlation. This is the essence of the "Correlated Cross-Occurrence" algorithm.

As to use of item properites, names and values are specified in fields for the query and they can be used to boost or filter results. The README.md describes this.

softwareklinic commented 8 years ago

Thank you - i think this is very helpful information. Yes, i would say some of the profile attribute don't change at all and some change with time and some change with action... also some are based on the 3rd party data that companies purchase from DMP companies. so, better to have user profile attributes/properties - which can be set - but again - we have for now used the user usage events for location, device, tenure, age, etc... and it works great. appreciate your help.