Open victusfate opened 3 years ago
After spending some time looking at hrnn and implementations, I switched gears to something simpler to support continuous learning https://github.com/online-ml/river
If anyone's curious I'm building an open source version here https://github.com/victusfate/concierge Just hooked up redis pubsub events into updating the model today
Todo: on server startup get all events since last model training and update each model
There are two different things you can do here with implicit to get near-realtime updates with the ALS model :
1) You can set the recalculate_user
flag on the model.recommend
calls to automatically regenerate the user representation . This lets your recommendations react to changes in what the user has interacted with at inference time.
2) I've added support for incremental retraining for ALS models just now with PR #527 - which will let you update the model with new items or users, as well as let you recalculate existing items with new interactions.
This is great news, I'd love to compare the results to river-ml since I have more experience with implicit.
When it's ready for review, it'd be great to see a small sample program/example with live updates to the model for recommendations Oh it's already ready to try out, I'll get this on my schedule.
Also worth noting I got the deployed system to work great.
I gather all user item ratings hourly for a full training (snapshot model). When new servers come up they load this model and then delta train from a redis ordered set of all user item ratings since the last model snapshot. In addition live models receive real time updates via redis pubsub.
This way at scale, I can have multiple predictor http servers all yielding similar results (can't guarantee they all receive all updates in the same order), but they are generally convergent. https://github.com/online-ml/river/discussions/803
In the case where a user is new, but the server is incapable to fit it yet into the model (as @victusfate explained, cause a pub/sub flow to add new users/items should preferably have certain delay for performance optimisation); How could I recommend to this new user?
Should I use the recommend
method with a random userid
and pass to user_items
the few interactions of this new user? If that is true, could make sense to make the userid
parameter optional?
(This assumption is made by not knowing the truly relevance of the userid
in the recommend
method if the recalculate_user
flag is true)
@sorenrife I ended up using popular results for new users in my current deployment using implicit (just hourly trained atm), and I think you can take the same approach with live model updates (keep an active popularity rank going as ratings come in)
something like this (grabbing code snippets from my hourly training) -> df is a pandas data set
pr = df.groupby([constants.ITEM_COLUMN])[constants.RATING_COLUMN].sum()
pr = (pr-pr.min())/(pr.max()-pr.min())
self.item_popularity_map = pr.to_dict()
self.item_popularity_map = {k: v for k, v in sorted(self.item_popularity_map.items(), key=lambda item: item[1],reverse=True)}
and in the rankings method
def rankings(self,user_id: str,selected_items):
ranks = {}
selected_idx = []
for selected_item in selected_items:
selected_idx.append(self.inv_item_map[selected_item])
# handle novel / unknown users with popularity rank
if user_id not in self.inv_user_map:
try:
# print('rankings selected_items',selected_items)
for k in selected_idx:
item_name = self.item_map[k]
score = self.item_popularity_map[k]
# print('rankings k',k,'item_name',item_name,'score',score)
ranks[item_name] = float(score)
except Exception as e:
print('ImplicitPredictor.rankings popularity exception',e)
else:
user_idx = self.inv_user_map[user_id]
try:
rankings = self.model.rank_items(user_idx, self.user_items, selected_idx)
for item_idx,prob in rankings:
item_name = self.item_map[item_idx]
ranks[item_name] = float(prob)
except Exception as e:
print('rankings exception',e)
return ranks
I admire the api, efficiency, and results of implicit.
I'm finding a need for real time training + prediction in some of my company's systems, and started searching around for ideas/implementations. Has anyone had experience working with this?
Realize this is off topic from implicit (totally understand if it's closed). Starting to look for ideas here: