Open ChrisWhittington opened 2 years ago
I believe you're probably missing the network weights file? As far as I am aware, Fire looks for weights in the directory you're running it from. For the example provided, I placed binaries for Stockfish 12, Berserk and Fire as well as the "raptor.bin
" network in the project root directory.
Regards, Connor
Thanks, I used the nn.bin downloaded with Fire_8 (just saw your edit - is raptor.bin another Fire nn? and made sure they were in the working dir). Performed the test with a bunch of engines (testing at d=1) and got some very strange results, some engine pairs were correlating at 0.1 or so, others at 0.9, which didn't make much sense to me, after all they are all playing chess, and chess evals ought to show fairly strong correlation just by being chess evals. I think the problem is that if in some positions, an engine finds a mate and the other doesn't (differences in Qsearch), the eval delta will be in the order of MATE_SCORE. So, hacked a "fix" for that, and all engine pairs then come in with corr > 0.8 or so. Meaning, the test is very sensitive to the FEN selection and then to QSearch(), and my purpose to use it at d=1 to test NN similarity didn't work.
I'll try and see if a better "fix" to this code (than the hack I just did)
a_values = a.scores - a.scores.mean() b_values = b.scores - b.scores.mean() return np.dot(a_values, b_values) / np.sqrt((a_values a_values).sum() (b_values * b_values).sum())
by culling score pairs where abs(a.scores) > 3000 or abs(b.scores) > 3000 might work out better. 3000 being an arbitrary guess. It might also be a good idea to remove any score pairs where the bm is a capture.
The reason for my interest, btw, is that I'm working on an entirely different SIM algorithm, and wanted to compare/verify it against your code output.
Ran again with score pairs culled as above (either abs(score) > 3000) and got these results d=1, 10000 FENS balanced by material phase.
corr(Fire8.NN, Fire8.NN) = 1.0000000000000002 corr(Fire8.NN, Berserk9) = 0.808711610850477 corr(Fire8.NN, Berser85) = 0.901183805804116 corr(Fire8.NN, Rubi-2.2) = 0.9293653951711445 corr(Fire8.NN, Seer-2.5) = 0.9564870294486083 corr(Fire8.NN, SFish-12) = 0.9173037766199017 corr(Fire8.NN, SFish-13) = 0.9211463236796953 corr(Fire8.NN, SFish-14) = 0.9050040156619338 corr(Fire8.NN, SFish-15) = 0.9377008345457257 corr(Fire8.NN, Koiv-8.1) = 0.899045168525631 corr(Fire8.NN, Koiv-7.0) = 0.9180309154772178
all other pairs are pretty similar
With additionally deleting any score pairs where either bm is a capture
corr(Fire8.NN, Fire8.NN) = 0.9999999999999996 corr(Fire8.NN, Berserk9) = 0.8945074985967109 corr(Fire8.NN, Berser85) = 0.9001774072523666 corr(Fire8.NN, Rubi-2.2) = 0.9289549883634292 corr(Fire8.NN, Seer-2.5) = 0.9584843817872238 corr(Fire8.NN, SFish-12) = 0.9170682680370982 corr(Fire8.NN, SFish-13) = 0.9240614716200442 corr(Fire8.NN, SFish-14) = 0.9205985318981863 corr(Fire8.NN, SFish-15) = 0.9390564017522552 corr(Fire8.NN, Koiv-8.1) = 0.8995933811301037 corr(Fire8.NN, Koiv-7.0) = 0.9207163469139146
Unadjusted results for comparison
corr(Fire8.NN, Fire8.NN) = 0.9999999999999993 corr(Fire8.NN, Berserk9) = 0.41527982066804964 corr(Fire8.NN, Berser85) = 0.5079797358310518 corr(Fire8.NN, Rubi-2.2) = 0.7366456978599996 corr(Fire8.NN, Seer-2.5) = 0.8224410577988219 corr(Fire8.NN, SFish-12) = 0.64454728021664 corr(Fire8.NN, SFish-13) = 0.6383152162266372 corr(Fire8.NN, SFish-14) = 0.582363423007721 corr(Fire8.NN, SFish-15) = 0.7488536077625506 corr(Fire8.NN, Koiv-8.1) = 0.3050050433692385 corr(Fire8.NN, Koiv-7.0) = 0.36445857212538224
Interesting. Thanks for the investigation. For my initial testing, I used an EPD containing only "quiet" positions and evals within a specified range (this excluded mate positions and mostly insured static evals were obtained). Perhaps I should add something about this to the readme. The choice of position types definitely seems to be important.
Hi Connor,
I'm running your code under VS2022, trying to repeat results for Fire (Fire 8 NNUE). Fire functions, but it always appears to return a score of 0, which is of course not very helpful.
Maybe it's the Fire download I have? Can you point me at the one you used?
Thanks, Chris