cosmicBboy / ml-research

Research projects in Machine Learning
MIT License
6 stars 2 forks source link

[metalearn] neurips bbo challenge idea dump #26

Open cosmicBboy opened 4 years ago

cosmicBboy commented 4 years ago

Noting these down for the neurips bbo challenge

cosmicBboy commented 4 years ago

idea 9: use Random Network Distillation, applied to the value of the next time step

cosmicBboy commented 4 years ago

idea 10: try Q-actor critic method instead of advantage function

cosmicBboy commented 4 years ago

idea 11: use simpler policy architecture, with multivariate normal to jointly produce all hyperparameters instead of sequentially with an RNN

cosmicBboy commented 4 years ago

idea 12: use model-based RL to estimate the reward function (function approximator can even be gaussian process!)