actionml / template-scala-parallel-universal-recommendation

30 stars 21 forks source link

Random rank #23

Closed laser13 closed 8 years ago

laser13 commented 8 years ago

In this PR I made a lot of changes, but mainly they are: fix of scala style error, turn imperative code to functional style. Business logic calculating "cooccurrence" and "popular rank" not changed.

  1. Added scalastyle sbt plugin, and code is checked.

    sbt scalastyle

  2. Added scalariform sbt plugin, code is configured and formatted.

    sbt scalariformFormat

  3. "backfillField" renamed to "backfills", and now is the array.
  4. backfill types now may be one of the value ["popular" OR "hot" OR "trending"] AND/OR "userDefined" AND/OR "random"
  5. backfill names are used as names in ES, but name with type "userDefined" must match with "$set" field name.
  6. Add to Query field "withRanks", if true, show rank fields in ItemScore as map(rankName -> rankValue)
  7. Base integration test is passed success
  8. Add new integration test "integration-test-rank". It uses other fixtures data and engine.json
pferrel commented 8 years ago

Too many refactorings to compare code. To feel confident we will have to compare generated query structure and results of the 2 recommenders side by side or before and after the refactoring.

@laser13 did you compare the "integration-test" results before and after your changes?

laser13 commented 8 years ago

@pferrel Yes, I use integration tests to check that code works correctly.

pferrel commented 8 years ago

you compared the previous version to this version and the integration test produced identical results?

Did you compare the ES queries generated?

pferrel commented 8 years ago

One major concern, see the notes on PopModel concerning eventsRDD. This does not work because eventRDD does not contain all items unless $set is included in the eventNames passed to PEvents.find

laser13 commented 8 years ago

@pferrel

you compared the previous version to this version and the integration test produced identical results? Did you compare the ES queries generated?

Yes I compared result integration tests in branch master and current branch. ES queries identically.

laser13 commented 8 years ago

@pferrel

One major concern, see the notes on PopModel concerning eventsRDD. This does not work because eventRDD does not contain all items unless $set is included in the eventNames passed to PEvents.find

Fixed it, now it's correctly work for all items

pferrel commented 8 years ago

Thanks, I'll start merging this.

Questions: 1) have you updated tests to test for random ranking of items with no usage events? 2) The way you have done this will use $delete events to give random ranking, this is a contradiction but UR users are never advised to use this event so I'll note in the code for now. 3) I assume an empty eventNames is passed in to random ranking, it's hard to find where it is in the PR diff, which is getting too complicated.

BTW in the future, can we exclude style changes, they clutter the PR diff so it is very hard to tell important changes. Style changes can go in separate PR if needed.

Thanks again. Some good/needed refactoring here.

laser13 commented 8 years ago

@pferrel

1) have you updated tests to test for random ranking of items with no usage events?

I`m tested this by hand, but I can quickly add it in integration test

3) I assume an empty eventNames is passed in to random ranking, it's hard to find where it is in the PR diff, which is getting too complicated.

random rank calculate for all existing action and items which have any $set

pferrel commented 8 years ago

1) I'm in the middle of merging and understand the tests better so I'll add the test for items with no usage events

3) ok, easier to see now that I'm editing code.

laser13 commented 8 years ago

3) eventNames don't used in randomRank https://github.com/actionml/template-scala-parallel-universal-recommendation/blob/master/src/main/scala/PopModel.scala#L99 https://github.com/actionml/template-scala-parallel-universal-recommendation/blob/master/src/main/scala/URAlgorithm.scala#L454

pferrel commented 8 years ago

A PEvents query with no event name gets all events including $delete, which is a special event to delete the object. Just FYI, ideally we would use only $set + all possible eventNames, not including $delete.

This is a minor issue. As I said users are not expected to use the $delete event.