RUCAIBox / RecBole

A unified, comprehensive and efficient recommendation library
https://recbole.io/
MIT License
3.35k stars 603 forks source link

[🐛BUG] wandb only shows one run of gridsearch #1598

Open mmosc opened 1 year ago

mmosc commented 1 year ago

Hi!

I am training several instances of BPR with a grid search on the embedding size and the learning rate. I would like to monitor training of each run on wandb. However, wandb only displays the first run (i.e., the first point on the grid).

Am I doing something wrong with the configurations, or is wandb + RecBole not working for grid search?

Best, Marta

Wicknight commented 1 year ago

@mmosc Hello, sorry for the late reply. Actually when you use hyper_tuning in RecBole, 'wandb + RecBole' isn't working well. Instead, we provide another way to visualize hyper-tuning process, as a display output file. The path of this file is controlled by the parameter display_file in run_hyper.py.

mmosc commented 1 year ago

Hi @Wicknight and thank you for your reply!

Okay, I have two remarks:

  1. Does setting the display_file allow following the training process of each grid element, i.e., the evolution of the train and val losses and metrics, as a function of epochs? Or does it only store the final values for each model instance?
  2. Maybe it would be good to make the user aware of the incompatibility between wandb and hyperparameter tuning, with a warning or maybe even an Error.

Best, Marta

Wicknight commented 1 year ago

@mmosc For the first one, actually we now only store the final values for each model instance with display_file parameter. You can refer to our documents for specific examples.

For the second one, thank you for your advice! We will consider addressing this compatibility problem in a future update to make wandb easy for users to use in hyperparameter tuning process!