datarootsio / tutorial-hyperparameter-optimization

Tutorial for Rootlabs@Lunch: Practical Hyperparameter Optimisation
MIT License
1 stars 0 forks source link

Notes on cleaning notebook for open sourcing #1

Open devdnhee opened 2 years ago

devdnhee commented 2 years ago

Code is in general very readable. From just validating your code I could understand everything. I've got a few recommendations:

  1. You can share your collab as well to be readable by anyone and share the link in the README, that way people can access it quicker by making local copies (instead of downloading + moving to your own collab workspace).
  2. Let the output of the cells be visible, so people understand the code and the concepts better when reading the notebook. Since these are logs for each trial might be a bit too much, is there no way to convert the study to a dataframe and show the first 10 trials after run time?
  3. All is covered more in detail on YT en the notebook, but it would still be nice to have a short overview / summary in the README
  4. Provide a link to Optuna documentation in the README + notebook
  5. I'd place the N_TRIALS assignment above in the same cell as where you provide the general overview.
  6. There is some commented code here and there (example -> cell 7 line 20) either tell us why we would want to uncomment or leave it out
  7. train_evaluate and objective functions are mostly duplicates of each other. As I understood the code train_evaluate is the same everywhere so only keep one, and the only difference in the objective functions is when you check if pruning is enabled. Can't you generalize that into one function instead of copy pasting?
  8. Since your objective function is the MSE, I'd rename objective to reflect that
  9. Refactor your studies to be more explicit, example study3 -> study_cmaes
  10. maybe insert some markdown comments to your visualizations
YannouRavoet commented 2 years ago

Code is indeed very readable. Issues are really the same as in Dorian's review:

  1. Having a direct link to a collab would indeed be nice. I started cloning the repo, before noticing it needed to be run from Google Colab to access the dataset.
  2. It would indeed be clearer that it is really easy to use different optimizers if the code for def train_evaluate(params) def objective(trial) is only defined once.
  3. In code block 15 (when you output the RMSE scores for each optimization), it could be interesting to compare the found parameter values (they are really different per optimization)
  4. A short explanation for each of the graphs would be nice indeed, but if you take a minute, the meaning of each graph is really clear.
  5. I would rename the Further Topics section with Visualization and drop the subtitle
  6. I like the objective name, as it is the same as the code examples on the optuna website.