Policy/value split - Githubissues

andyljones / boardlaw

Scaling scaling laws with board games.

MIT License

38 stars 7 forks source link

There's a lot of analytics that'd be easier if I could toy with the value and policy networks independently. And the PPG paper shows that it can lead to a serious performance bump too - you just need to retain a way of letting the policy net piggyback on the value net's features.

PPG can't be adapted directly though because we don't have fresh logits in the learner, and frankly I don't like all the hyperparams PPG adds either. So I'll need to do some exploration to figure out what works and is simple enough for my tastes.

To rephrase: is the important part of PPG the multitask learning? If it is, can I sub their KL-based distillation out for something else? If not, what is the important part of PPG?

andyljones / boardlaw

Policy/value split #8