Open darrylong opened 9 months ago
One way to speed up evaluation is to do sampling for negative items. This usually makes sense when the set of items is huge and we can't afford to rank all of them. The slowness is probably caused by argsort
function:
https://github.com/PreferredAI/cornac/blob/57470771ede09e85df461525d33944c585b90a87/cornac/models/recommender.py#L285
Sketch idea is to let user define how many items they want to use as negatives (e.g., 100), then we do sampling somewhere here: https://github.com/PreferredAI/cornac/blob/57470771ede09e85df461525d33944c585b90a87/cornac/eval_methods/base_method.py#L195 After that, where are two ways to continue:
A smarter way is adaptive, analyzing both and set a flag to pick one or the other for subsequent iterations. Just something coming on top of my head at the moment. Let's discuss if anyone has more clever ideas.
My thought on this is because of the score()
function in Recommender
class:
https://github.com/PreferredAI/cornac/blob/57470771ede09e85df461525d33944c585b90a87/cornac/models/recommender.py#L192
where we only evaluate users one by one. Will doing batch user evaluation be faster for GPU-based models?
Most of the GPU-based models only utilize GPU during training but not evaluation. Their predictive functions tend to be dot-product between user and item embedding which is quite efficient with Numpy. It wouldn't hurt to have, let's say score_batch()
function, to do scoring for a batch of users. However, the speedup might be negligible based on my experience . Also, to take full advantage of this, we might need to re-design metrics to compute for a batch of users, which is not trivial for some cases.
Some models with a large number of parameters cannot generate scores for a large number of items (using batch scoring function) due to insufficient memory. Especially, when the scoring function requires to compute the score for every pairs of user and item that leads to scalability issue when ranking the full set of items.
I prefer to add an optional parameter to specify the number of negative samples for ranking evaluation. This has been applied in many research. Of course, we need to ensure reproducibility by specifying a random seed for whatever pseudo-random generator was used.
2. ask the model to score only true positives and sampled negatives, put dummy scores for the items not being used while computing metrics. * pros: could be faster if negative items set to a small number * cons: diminishing return when not utilizing matrix ops and negative items set to a large number
@lthoang @hieuddo @darrylong it seems that we can have an option for negative sampling during evaluation. Anyone wants to take a lead on this improvement? 😃
Description
Certain ranking metrics currently take a considerably long time for a single calculation output.
Other Comments
It could be due to:
Some slow calculation in the
ranking_eval
function. https://github.com/PreferredAI/cornac/blob/57470771ede09e85df461525d33944c585b90a87/cornac/eval_methods/base_method.py#L174 More testing is required to determine which part is to be optimised.How the
score
function calculates during evaluation on a user-by-user basis. https://github.com/PreferredAI/cornac/blob/57470771ede09e85df461525d33944c585b90a87/cornac/models/recommender.py#L192Further discussion to find how we could improve performance would be great. :)
Thank you all!