karlhigley commented 2 years ago

Problem statement

Merlin models should allow you to train models from different libraries (TF/PT/lightfm/xgboost/implicit etc) and then compare them in a consistent "apples-to-apples" type way. Right now each library has its own evaluation metrics, that aren't directly comparable to other frameworks. During the paper experiments for instance, we had to implement custom metric code for implicit to match the output from TF. Customers also want to be able to compare to their own metrics.

Goals

Customers are able to compare models across libraries.
Coverage of metrics: Precision, NDCG, MAP, AUC
Support for both retrieval and ranking models
Only used during final evaluation process. This isn't used during training. (Non goal)
Ability to slice on key features from user/item
Example

Constraints

Scoring should happen for all items, not just negative samples, in order to be able to compare correctly for ranking metric
Only one implementation of these metrics so that the calculations are consistent. This implies conversion.
We want to be able to compare across multiple frameworks.
Pre-requisites:
https://github.com/NVIDIA-Merlin/Merlin/issues/419
Support for external libraries and systems level evaluation

Starting Point

Input is an ordered list of recommendations that gets scored/sliced
Evaluation is from a list of prediction
cuDF implementation
Integration with visualization tooling. (marc to add options)

Notes

Gabriel has a POC of evaluation framework that takes a cudf dataframe with predictions and computes using cupy popular top-k metrics (recall, precision, mrr, map, ndcg) - code: https://github.com/rapidsai/recsys/tree/main/benchmark_recsys/code/evaluation
Marc built a system like this at Spotify which converted to specific framework metrics

Tasks

( Ground work for cross framework evaluation )

[ ] Create a mechanism for transferring data across frameworks
- [ ] https://github.com/NVIDIA-Merlin/core/pull/102
- [ ] Design document
- [ ] PoC for Cross framework model evaluation metrics
- [ ] Prepare presentation ( TBC ) and collect feedback from team

( Enter the goal here )

[ ] Establish a standard set of metrics (to be used in conjunction with cross-framework data transfer)
[ ] Create example for cross framework evaluation metrics

viswa-nvidia commented 2 years ago

@karlhigley @jperez999 , could you please update this ticket with the problem , goal and constraints. I believe you are laying a lot of the ground work in 22.07. Could you please add that information here. Let me know if you are running in to any difficulties.

karlhigley commented 2 years ago

Duplicate of NVIDIA-Merlin/models#450

karlhigley commented 1 year ago

Closing https://github.com/NVIDIA-Merlin/models/issues/450 instead

viswa-nvidia commented 1 year ago

Based on discussion with Karl, the tasks are not fully listed down

EvenOldridge commented 1 year ago

@marcromeyn can you review and add any comments/concerns.

viswa-nvidia commented 1 year ago

@marcromeyn please review definition and add tasks for 22.11 and 22.12 - starting point specifically

viswa-nvidia commented 1 year ago

@marcromeyn , please add information about the PoC in the ticket

NVIDIA-Merlin / Merlin

[RMP] Cross-framework model evaluation metrics #407

Problem statement

Goals

Constraints

Starting Point

Notes

Tasks