Equim-chan / Mortal

🚀🀄️ A fast and strong AI for riichi mahjong, powered by Rust and deep reinforcement learning.
https://mortal.ekyu.moe
GNU Affero General Public License v3.0
929 stars 118 forks source link

名次计算可能有bug? #16

Closed hyskylord closed 2 years ago

hyskylord commented 2 years ago

将challenger和champion设置为相同跑one_vs_three,按道理得到每一个名次的次数是相同的,但实际上并不一样,下面是其中一段结果

# 19
2022-08-17 10:41:31,151     INFO one_vs_three.rs:141  seed: [19500, 20000) w/ 10671399319166252728, start 500 groups, 2000 hanchans
o steps: 1182 (3.581 step/s)
[00:05:30] [#########################################################################################################################################################################] 2000/2000 100%2022-08-17 10:47:01,217     INFO one_vs_three.rs:208  dumping game logs
challenger rankings: [499 502 499 500] (2.5, 0.0pt)
--------------------------------------------------
# 20
2022-08-17 10:47:35,629     INFO one_vs_three.rs:141  seed: [20000, 20500) w/ 10671399319166252728, start 500 groups, 2000 hanchans
o steps: 1342 (3.815 step/s)
[00:05:51] [#########################################################################################################################################################################] 2000/2000 100%2022-08-17 10:53:27,451     INFO one_vs_three.rs:208  dumping game logs
challenger rankings: [500 500 497 503] (2.5015, -0.2025pt)
Equim-chan commented 2 years ago

这个我之前也发现过,不过没有细查。你可以找一下哪一组的排名不是 uniform 的,然后 diff 一下看看是不是有不同的 action。我的猜测是和 batch size 有关,不同的 batch size 可能会给出略微不同的结果。

hyskylord commented 2 years ago

10038_1957153204106680927_a.txt 10038_1957153204106680927_b.txt 确实是这样,batchsize1和3的q值计算略有不同,并不是名次计算的问题。