Closed mvuthegoat closed 1 year ago
I review the paper and the code, in the paper they refer to "α = 3/2, β = 0, and γ = 2 " as a optimal parameter. In their formular
beta_coef equals 1 / (1+ 1) , which would always be 1/2. So I replace coef_beta with a constant 1/2 in my implementation.
I run your code, and compair it with baseline, and it's performance is worse than baseline. I will delay the merge until the performance problem is solved.
ps: always create pull request to gui branch, which most change happens.
Oh, I see you use beta as 1/2 directly, which is indeed 1 / (1 + 1). You can keep the code the same, though:)
Should we change beta to beta_coef here? I read the paper on Discounted CFR (https://cdn.aaai.org/ojs/4007/4007-13-7066-1-10-20190704.pdf?_gl=1*le8l1f*_ga*Mzc2NzU4NDM0LjE2ODE3NjcxMDM.*_ga_CKNBPFEYPG*MTY4MTc2NzEwMi4xLjAuMTY4MTc2NzExMS4wLjAuMA..), and it says multiply negative regrets by the beta coefficient instead of just beta as shown in the code. I might be missing something, though.