Generic Popularity Query Clarification

delsol1 commented 7 years ago

I'm newly testing the UR engine and was looking at the output in 'integration-test-expected.txt' for sample-handmade-data.txt For the most generic query, {} (no user, no item) using a vanilla engine. json, it seems that rec output would default to a list of 'popular' items with the most number of primary events -- as described below:

The "backfillField" defaults to ranking all items by the count of primary indicators/event for all time in the training data. This calculates popularity over the long term.

The results in 'integration-test-expected.txt' show: iPhone4, Galaxy, Nexus, IPad-Retina

However, counting purchase lines in the sample data, the iPhone5 (with 2 purchases like the iPhone4 and Galaxy) is missing.

This caught my attention because when I trained my very simple dataset with only one eventName it did something similar: returned the two items with the most primary event entries, but then also two items with the least entries (the most unpopular items).

Can you help clarify the behavior and what I'm seeing? Thanks.

delsol1 commented 7 years ago

Anyone in the community have any ideas/thoughts on this?

Thx.

pferrel commented 7 years ago

there is a dateRange filter to show that date and property filters work with popular items like any other recommendation. This also tests that the date filters are working correctly. The upshot is that what you are seeing is intentional and actually part of the integration-test.

BTW using the Google group will get you better responses.

On Dec 27, 2016, at 8:01 AM, delsol1 notifications@github.com wrote:

Anyone in the community have any ideas/thoughts on this?

Thx.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/PredictionIO/template-scala-parallel-universal-recommendation/issues/55#issuecomment-269344852, or mute the thread https://github.com/notifications/unsubscribe-auth/AAT8S5Xq7fyRHgksqhivo7F6b12GTPWzks5rMTZzgaJpZM4K5yGn.

delsol1 commented 7 years ago

Thanks Pat, appreciate the response and the advice on the google group forum.

I will work with the date filters.

To clarify: so in the most generic case where the query is completely unqualified, it's not unusual that the response would contain items which are not the most popular? If that's the case then I'm going to struggle with that from an intuition standpoint.

Hope I'm missing something conceptually here.

pferrel commented 7 years ago

The example in the integration test limits returned recommendation by date range so the most popular items that are within the data range. Without date ranges it return the most popular as you quoted from the docs. You will get the most popular items following qualifications or without them. This allows you to set business rules on popular items (filters, boosts, date ranges, etc) just like any other recommendation and with popular items they will always be the most popular that are allowed given the rules.

What are you struggling with intuitively?

BTW 1) this is the wrong template repo to use with Apache PIO. Follow instructions here:http://actionml.com/docs/pio_quickstart and here: http://actionml.com/docs/ur_quickstart and 2) for UR questions please use the Google Group that is setup for support here: https://groups.google.com/forum/#!forum/actionml-user

PredictionIO / template-scala-parallel-universal-recommendation

Generic Popularity Query Clarification #55