LeelaChessZero / lc0

The rewritten engine, originally for tensorflow. Now all other backends have been ported here.
GNU General Public License v3.0
2.36k stars 519 forks source link

Add policy mix option to value-only mode. #2004

Open Tilps opened 3 months ago

Tilps commented 3 months ago

Bit of a hack. If we decide this is useful I should also add it to value_tournament, clean it up so it does literally nothing when mix is 0.

Running a tune with T82 network - looks like optimum is to have very low policy temperature and a moderate policy mix. Tune probably doesn't have right range for PST - but current minima (after 6000 games) is 0.325 (policy mix) and 0.26 (policy temperature). Elo gain vs pure value mode is estimated as ~40 elo. As the policy temperature goes down, so does the optimal policy mix. At policy temperature 1 the optimal policy mix seemed closer to 0.5. (Which was also probably >30 elo)

Tilps commented 3 months ago

New optima at 9000 games - {'PolicyMix': 0.14689769678592882, 'PolicyTemperature': 0.2}

Tilps commented 3 months ago

15000 games {'PolicyMix': 0.4831467071587293, 'PolicyTemperature': 0.2} (Elo estimate is at 49)

seems like I'll need to do another run expanding policy temperature even further...

Tilps commented 3 months ago

18000 games - {'PolicyMix': 0.5291014358428309, 'PolicyTemperature': 0.2} (elo estimate 47)

Going to do the restart with double number of rounds overnight.

Tilps commented 3 months ago

restarting with wider policy temperature (and removing the clearly bad negative policy mix values) oddly decided to go find a minima with a completely different policy temperature.. {'PolicyMix': 0.4585496867191581, 'PolicyTemperature': 1.222335248497085} (elo estimate of 45 after 36000 tuning games)

so seems like the theory that policy mix reduces as temperature reduces is invalid... Policy temperature just seems to have a relatively small effect maybe...

(There was an early minima at temperature 0.05 with PolicyMix of 0.35 - but it never stabilized there seems rare after the first 18000 tuning games.)