actionml / harness

Harness is a Machine Learning/AI Server with plugins for many algorithms including the Universal Recommender
Apache License 2.0
283 stars 49 forks source link

Using old indicator events #286

Open AgusPietra opened 3 years ago

AgusPietra commented 3 years ago

Description

Recommendations for user are based only on the oldest maxIndicatorsPerQuery events of an indicator, not the newest.

Details:

This is the specific part of the code that I think has the problem:

.groupBy(_.event)
          .flatMap { case (name, events) =>
            events.sortBy(_.eventTime) // implicit ordering
              .take(indicatorsMap(name)
                .maxIndicatorsPerQuery.getOrElse(DefaultURAlgoParams.MaxQueryEvents))
          }.toSeq

I propose to change it to:

.groupBy(_.event)
          .flatMap { case (name, events) =>
            events.sortBy(_.eventTime).reverse // implicit ordering
              .take(indicatorsMap(name)
                .maxIndicatorsPerQuery.getOrElse(DefaultURAlgoParams.MaxQueryEvents))
          }.toSeq

The sort by is ordering in ascending order, I'm actually debugging the code and watching this behavior.

Sorry to open another issue with this, I think the oter issue I opened wasn't clear enough, so you misunderstand me.

Thanks for reading

pferrel commented 3 years ago

this only affects the ordering of events returned from the MongoDB query, which should get the newest events. However this might cause some of the newest of those events to be dropped.

BTW I think the fix is events.sortWith(_.eventTime after _.eventTime)

AgusPietra commented 3 years ago

Well, actually events are retrieved in the correct order from mongo, but the limit of those events is set by this value:

maxQueryEvents = if (params.indicators.isEmpty) {
      params.maxQueryEvents.getOrElse(DefaultURAlgoParams.MaxQueryEvents)
    } else { // using the indicator method of setting query events
      params.indicators.get.foldLeft[Int](0) { (previous, indicator) =>
        previous + indicator.maxItemsPerUser.getOrElse(DefaultURAlgoParams.MaxQueryEvents)
      } * 10
      // this assumes one event doesn't happen more than 10 times more often than another
      // not ideal but avoids one query to the event store per event type
    }

maxQueryEvents will be almost always much higher than maxIndicatorsPerQuery value, so, as we sort again in this part of the code:

.groupBy(_.event)
          .flatMap { case (name, events) =>
            events.sortBy(_.eventTime) // implicit ordering
              .take(indicatorsMap(name)
                .maxIndicatorsPerQuery.getOrElse(DefaultURAlgoParams.MaxQueryEvents))
          }.toSeq

we end up losing recent events in cases were users has more history than the number defined by maxIndicatorsPerQuery. For us, this is the normal scenario, and we don't like the idea of having maxIndicatorsPerQuery increased to compensate for that, because we would lose reactivity.

Thanks again for reading!!

pferrel commented 3 years ago

A PR has been created for this. https://github.com/actionml/harness/pull/287

Thanks @AgusPietra for finding this! Let us know if you have a chance to try this.

AgusPietra commented 3 years ago

It was nothing, thanks to you!

I'll try it tomorrow