Different marking system?

ashinpan commented 6 years ago

A Reddit user "Friday9i" said as follows:

"Chess is a nullish game: ELO rating is plateauing even if the level is still increasing. Ex: if a new stronger version v2 beats the current AC (AlphaChess) with 90 wins, 10 losses and 900 draws, it'll be only 28 ELO above AC! Another version v3 beating AC 100 wins, 0 losses and 900 draws would be even much stronger, but it'll get only 35 ELO points above AC, and 7 Elo points better than v2. Draws are completely flattening Elo curves when players are good enough to make essentially drawn games, and it doesn't reflect the difference in strength of these players anymore."

I think he has a point. However, given that white has the first-mover advantage in chess, how about we give 0.4 and 0.6 points respectively to white and black in a drawn game? Such an arrangement would certainly help to differentiate the strengths of different networks.

evalon32 commented 6 years ago

Can you elaborate how this would affect ELO calculations? What would you get instead of 28 and 35 in the above example?

ashinpan commented 6 years ago

@evalon32 In this method, we must classify the drawn games of a given network into (1) draws as black, and (2) draws as white. Then, we give it 1 point for a win, 0 point for a loss, 0.6 for a draw as black, 0.4 for a draw as white. Its opponent is evaluated in the same way. Then Elo difference is calculated.

Now in the example above, suppose a new network plays 1000 match games, and performs as follows:

90 wins (all as white) 10 losses (all as black) 410 draws (as white) 490 draws (as black)

Then, the network has won (90 + 410x0.4 + 490x0.6 =) 548 points. On the other hand, its opponent has gained (10 + 410x0.6 + 490x0.4=) 452 points.

Then, the Elo difference would be 33. In contrast, in the present system, the Elo difference is 28.

Ipmanchess commented 6 years ago

If you have a new tool to calculate ELO differently then for sure i want test it out on my databases! Using so long i test engines ELO & Ordo to make my rating lists..so i would see directly the differences..

Akababa commented 6 years ago

Shouldn't it make no difference (in long run) if we play same number of games as black and white?

ashinpan commented 6 years ago

@evalon32 You do not need a new tool for this method. You do not account for draws separately but compare the overall points gained by each player in a given set of games. This is the same method used in Go, which has no drawn games.

The only thing I am nor sure about is whether it is fair enough to reward 0.4 to a draw as white and 0.6 to a draw as black.

ashinpan commented 6 years ago

@Akababa "Shouldn't it make no difference (in long run) if we play same number of games as black and white?" Even though you play the same number of games as black and white, you would win some and lose some games. So the respective numbers of drawn games can be different for white and black. The stronger party should be able to make more draws using black.

Ipmanchess commented 6 years ago

That is my testing way..each engine plays against all engines in my lists and with reversed color..more equal you can't get with balanced opening book to get more fair play and are more accurate..

Ipmanchess commented 6 years ago

Many have talked about different calculating ELO ..but no one has made a tool from it..why? When you have a good idea..bring it out and let testers try it out..then you will get a confirm..and for us we have something new to put on our websites..

ashinpan commented 6 years ago

@Ipmanchess My suggestion is not a different way to calculate Elo. Rather, it is a different marking system for chess game results. And the purpose is to judge and compare different networks, not for real life games.

evalon32 commented 6 years ago

If draws are the problem, you could simply discount their weight, rather than try and find the correct "komi" value for chess. For example, count only half of the draws (Elo difference +51) or don't count them at all (Elo difference +382).

Krgp commented 6 years ago

Just for the academic purpose if we differentiate draws with white & black, we have to differentiate Wins/Losses also. So for a win with white 0.9 pt and a win with black 1.1 pt ... is that alright?

Coming to Leela, your idea introduces (indirectly) a concept from Human knowledge --> White is better than Black. Doesn't 'Zero' mean Lczero has to draw its own conclusions? And is White really better than Black for sure ? If so, Leela will have to find it on its own ...

jjoshua2 commented 6 years ago

I think @Krgp is right. If leela stalls, we could try ideas like this to keep on going, but it's nice to see what she thinks with zero human input about who should win or draw first.

Ishinoshita commented 6 years ago

You could as well redefine the rules of chess to completely remove draws, counting draws as failure for white who moved first, and thus as win for black. But this wouldn't solve your problem. When alternating colors, on average the relative difference in win/loss for strong players will remain very low, yielding low difference in terms of ELOs. Chess is drawish by essence. No easy way to introduce some fair compensation for side to move second, contrary to Go, where adequate 'komi' can be set very precisely to keep average win/loss ratio very close to 50% for both colors (~51-52% vs 49-48%), meaning each game is an 'open' game.

evalon32 commented 6 years ago

I think drawishness and first-move advantage are really different issues that shouldn't be conflated. Consider checkers, for example: it has draws as well (and in fact has been proven to be drawn with perfect play), but no first-move advantage (or disadvantage).

Ishinoshita commented 6 years ago

@evalon32 Agree with the distinction you are pointing at. Thank you for correcting me. But disagree with your example, since a game where either black or white can secure a draw offer no advantage to either side. Board value is zero from the start, under perfect play, so de facto no side to move effect ;-) 'C

jkiliani commented 6 years ago

@Ishinoshita Has anyone started a Leela Zero implementation for Shogi yet? Probably not much interest outside of Japan, but it's a very interesting game with a really low draw rate. Also, it appears that top level shoji programs lag considerably behind top level chess engines, so a Shogi implementation of Leela should be a big success in Japan at least...

Ishinoshita commented 6 years ago

@jkiliani No, not heard of any serious attempt to far. The here below attempt is still in infancy, as far as I can tell from the repo. https://github.com/tkm2261/shogi-alpha-zero

glinscott / leela-chess

Different marking system? #414