Improve computation of ranking related metrics

darrylong commented 9 months ago

Description

Certain ranking metrics currently take a considerably long time for a single calculation output.

Other Comments

It could be due to:

Some slow calculation in the ranking_eval function. https://github.com/PreferredAI/cornac/blob/57470771ede09e85df461525d33944c585b90a87/cornac/eval_methods/base_method.py#L174 More testing is required to determine which part is to be optimised.
How the score function calculates during evaluation on a user-by-user basis. https://github.com/PreferredAI/cornac/blob/57470771ede09e85df461525d33944c585b90a87/cornac/models/recommender.py#L192

Further discussion to find how we could improve performance would be great. :)

Thank you all!

tqtg commented 9 months ago

One way to speed up evaluation is to do sampling for negative items. This usually makes sense when the set of items is huge and we can't afford to rank all of them. The slowness is probably caused by argsort function: https://github.com/PreferredAI/cornac/blob/57470771ede09e85df461525d33944c585b90a87/cornac/models/recommender.py#L285

Sketch idea is to let user define how many items they want to use as negatives (e.g., 100), then we do sampling somewhere here: https://github.com/PreferredAI/cornac/blob/57470771ede09e85df461525d33944c585b90a87/cornac/eval_methods/base_method.py#L195 After that, where are two ways to continue:

let the model scores all items as usual, we then only compare the true positives and the sampled negatives while computing metrics.
- pros: utilize matrix ops which are quite efficient, less tedious changes
- cons: still compute scores for some items that will not be used
  1. ask the model to score only true positives and sampled negatives, put dummy scores for the items not being used while computing metrics.
- pros: could be faster if negative items set to a small number
- cons: diminishing return when not utilizing matrix ops and negative items set to a large number

A smarter way is adaptive, analyzing both and set a flag to pick one or the other for subsequent iterations. Just something coming on top of my head at the moment. Let's discuss if anyone has more clever ideas.

hieuddo commented 8 months ago

My thought on this is because of the score() function in Recommender class: https://github.com/PreferredAI/cornac/blob/57470771ede09e85df461525d33944c585b90a87/cornac/models/recommender.py#L192 where we only evaluate users one by one. Will doing batch user evaluation be faster for GPU-based models?

tqtg commented 8 months ago

Most of the GPU-based models only utilize GPU during training but not evaluation. Their predictive functions tend to be dot-product between user and item embedding which is quite efficient with Numpy. It wouldn't hurt to have, let's say score_batch() function, to do scoring for a batch of users. However, the speedup might be negligible based on my experience . Also, to take full advantage of this, we might need to re-design metrics to compute for a batch of users, which is not trivial for some cases.

lthoang commented 8 months ago

Some models with a large number of parameters cannot generate scores for a large number of items (using batch scoring function) due to insufficient memory. Especially, when the scoring function requires to compute the score for every pairs of user and item that leads to scalability issue when ranking the full set of items.

I prefer to add an optional parameter to specify the number of negative samples for ranking evaluation. This has been applied in many research. Of course, we need to ensure reproducibility by specifying a random seed for whatever pseudo-random generator was used.

2. ask the model to score only true positives and sampled negatives, put dummy scores for the items not being used while computing metrics.

   * pros: could be faster if negative items set to a small number
   * cons: diminishing return when not utilizing matrix ops and negative items set to a large number

tqtg commented 8 months ago

@lthoang @hieuddo @darrylong it seems that we can have an option for negative sampling during evaluation. Anyone wants to take a lead on this improvement? 😃

PreferredAI / cornac

Improve computation of ranking related metrics #530

Description

Other Comments