fit_partial function for ImplicitALSWrapperModel and LightFMWrapperModel

chezou commented 3 months ago

Feature Description

Implicit's ALS and LightFM provide partial_fit capability. https://benfred.github.io/implicit/api/models/cpu/als.html#implicit.cpu.als.AlternatingLeastSquares.partial_fit_users https://benfred.github.io/implicit/api/models/cpu/als.html#implicit.cpu.als.AlternatingLeastSquares.partial_fit_items https://making.lyst.com/lightfm/docs/lightfm.html#lightfm.LightFM.fit_partial

It would be great if RecTools had a standardized way to use them for incremental model updates.

Why this feature?

If we can use partial_fit, we can avoid full re-training every time. This is beneficial when the original model is trained by a massive dataset.

Additional context

No response

blondered commented 3 months ago

Hi! That would be great, we've been discussing this feature some time ago. Added to backlog. But right now priority is low for our team since we are focusing on more complicated models in the nearest future. But we would be absolutely happy to help if someone takes it outside our team. Just contact us in Telegram if you ever want to contribute.

chezou commented 2 months ago

Lesson learned in the first attempt:

We need to revisit feature handling for incremental training.
- We need to revise ImplicitALSWrapper's feature training and Dataset creation.
We have to discuss how we treat different behavior between Implicit ALS and LightFM about the first fit() requirement.
- Implicit requires fit() before partial_fit_users()/items(), however, LightFM's fit_partial() is equivalent with fit() without initialization.

blondered commented 2 months ago

Let's figure it out. My thoughts are:

Different behaviour from ALS and LightFM is not an option. It's extremely confusing. We need to make sure each model can do fit_partial when it's not fitted.
fit_partial should always have epochs parameter. It's important both for epochal and incremental training.
Incremental training is just a more complicated case of epochal training. We need to carefully handle epochal training first. Then add support for incremental dataset.
In incremental training there also might be two options: adding only new users and items. Or also adding new features. Adding new features is not really very smart. Let's say incremental training is done each day of the week and full training is done once a week. In these conditions it will be a much better solution to add new features during full training and check model performance carefully. It might be so that new features will hurt model performance. I would suggest not to make adding new features to the dataset as a part of incremental training support. Instead focus on support for new users and items.

From these points I see the following priorities:

Epochal training for ALS and Lightfm with fit_partial method and epochs parameter. Support for partial_fit from unfitted state for both models. Tests that epochal training provides exactly the same embeddings as usual training.
Support for fit_partial method for ALS with features. This requires major refactoring of our methods of ALS training.
ALS and LightFM support for adding new users and items in dataset that is passed to fit_partial method.
Dataset method to add/replace/update interactions and features. Example is rebuild_with_new_data method from the last PR (but it's very raw for now). We can discuss it in details when we come to this. Features must be carefully checked, we must guarantee exactly the same order of features in the feature matrix. Otherwise we will break embeddings linkage to features from the fitted model.
Support for adding new features to the fitted model during incremental training. My opinion is that we shouldn't do this in RecTools. It's extremely hard to support for all models (e.g. we are implementing SASRec with features support right now), it's not commonly useful and it might be conceptually wrong. We wouldn't do this on any of our real services recommender systems. What do you think?

chezou commented 2 months ago

Thanks for your detailed explanation. It makes clear understanding now.

I agree with not allowing adding new features. In general supervised learning tasks, we fix the input feature, even for fine-tuning. If we want to use different features, then we should train the model from scratch. That's a completely different model.

Supporting new users/items is critical for recommender systems. 100%.

One concern that comes to mind is providing fit_partial() method to ALS without fitting. There can be two options:

Fallback to regular fit() method if not fitted.
Add the same initialization as ALS.fit() method into RecTools' fit_partial() method.

In either option, we have to be careful that ALS.fit() can use two solvers while ALS.partial_fit_users()/items() can use a single solver. https://github.com/benfred/implicit/blob/b33b809cb585cb8a65ad39d0f97497d37e98acaa/implicit/cpu/als.py#L420-L422

If RecTools enforces not using Conjugate Gradient ALS by passing use_cg=False, RecTools' fit() and fit_partial() methods can be the same output. However, that means a user can't benefit from faster matrix factorization of Implicit. ref: https://github.com/benfred/implicit/commit/4139e0ae19eb227c9d3a1358fefc4dff57f627b5

How do you prioritize the tradeoff between consistent training results and training speed? Of course, we may be able to give users the option of using Conjugate Gradient ALS or not.

blondered commented 2 months ago

If we agree that adding support for features in epochal and incremental training of ALS is important, then we should take it into account already.

If we are sure we need to add support for features, we can reuse our code of model training. There we use exactly the same solver that was specified during ALS initialization. So user will have cg solver if he specified it for ALS. Which is great. We don't use ALS.partial_fit_users()/items() if we select this way of implementation.
If we stick to using ALS.partial_fit_users()/items() we have simpler code for now. But: 1) we can't support features, it's gonna be a mess. 2) we can't support using exactly the same solver that user selected => we can't guarantee that epochal training will give us same results as usual training. My opinion that this is a bad way to choose, it is messy and confusing. But I'm completely open to any other opinions.

If we drop ALS.partial_fit_users()/items() in implementation and thinks about adding support for features in epochal training I see the following steps:

First PR refactors ALS wrapper fitting code in the following way: 1.1 _fit for any number of epochs, not just ImplicitALSWrapperModel.model.iterations number. This will enable epochal training with epochs argument after fit_partial method is introduced. 1.2 Make clear distinction in the code between factors initialization and factors training. This will enable continuing training after model was fitted.

Last point it the most difficult one. We have 2 methods to refactor: fit_als_with_features_separately_inplace and fit_als_with_features_together_inplace. When calling these methods for an already fitted model of course we shouldn't make a deepcopy of an unfitted model (https://github.com/MobileTeleSystems/RecTools/blob/8a3b7165a6928507a4edfe01ecd100280b7a0a00/rectools/models/implicit_als.py#L73) but just pass self.model instead

Let's dive deeper.

fit_als_with_features_together_inplace For now for epochal training we assume dataset hasn't changed. So for fitted model we shouldn't init user_explicit_factors, user_latent_factors and user_factors_paired_to_items from scratch. We should take learnt factors from model current state.
```
if self.is_fitted:  
user_factors = self.get_user_vectors()  
item_factors = self.get_item_vectors()  
```
and then call _fit_combined_factors_on_gpu_inplace with additional epochs argument (same for cpu)
_fit_combined_factors_on_gpu_inplace We can leave the whole code of this method as is. But we need to change ImplicitALSWrapperModel.model.iterations attribute to the required number before calling it (and then change it back afterwards).

@feldlime please check me carefully :) and write your opinion

@chezou please tell me what you think and you have any questions/doubts? Are you willing to do this part of work?

I would suggest to do this refactoring first. And then introduce fit_partial method for both ALS and LightFM in one PR to enable epochal training, and then work in incremental training

blondered commented 2 months ago

@chezou if you don't feel comfortable with ALS refactoring PR, we can do it ourselves. And afterwards you can continue with further PRs

chezou commented 2 months ago

@blondered Sorry, I've been super busy with my work. Please go ahead for refactoring PR since I'm not confident enough.

As long as ALS.recalculate_user()/item() will be eventually re-implemented and user/item adding will be supported in the future, I agree with dropping ALS.partial_fit_users()/items().

blondered commented 2 months ago

Ok. It will not be immediate bit we'll try to do it as soon as we can.

blondered commented 2 weeks ago

@chezou Hi! I finished refactoring ALS code. As soon as we merge https://github.com/MobileTeleSystems/RecTools/pull/203 it is possible to open PRs for partial_fit. Great news is that partial_fit will work with features for ALS. Please write to me if you still want to do this :)

blondered commented 1 week ago

@chezou now that we've merged ALS refactoring, we can move on.

I suggest next steps to be:

partial_fit in ModelBase + iALS partial_fit implementation. We already have tests written for this model so that should be quite easy.
partial_fit implementation for LightFM
Dataset methods to add/replace/update interactions and features.
ALS and LightFM support for adding new users and items in dataset that is passed to fit_partial method (incremental training)

I can take the first PR (partial_fit for iALS) if you don't have the time right now. Would be great to add it to the upcoming release.

chezou commented 1 week ago

@blondered Sorry for the delayed response.

For personal reasons, I will be busy in the coming weeks.

I'm happy to take the first PR if it's not urgent. I think I can find some time at the end of this month. If urgent, please go ahead on your end.

blondered commented 6 days ago

@chezou I will write to you if I will take it myself. A bit busy too.

blondered commented 3 days ago

@chezou found some time to complete partial_fit for ALS: https://github.com/MobileTeleSystems/RecTools/pull/210

MobileTeleSystems / RecTools