Why mean over all actions sampled in multi outcome sampling

EricSteinberger / Deep-CFR

Scalable Implementation of Deep CFR and Single Deep CFR

MIT License

278 stars 61 forks source link

https://github.com/EricSteinberger/Deep-CFR/blob/master/DeepCFR/workers/la/sampling_algorithms/MultiOutcomeSampler.py

as 'aprx_imm_reg' here is computed for every action and put to buffer without being summed up, I have no idea why 'aprx_imm_reg *= legal_action_mask / n_actions_to_smpl '

I think it is because I could not understand the formula here(v~(I) = p(a) |A(I)), and I failed find corresponding part in your paper, """ Last state values are the average, not the sum of all samples of that state since we add v~(I) = p(a) |A(I)|. Since we sample multiple actions on each traverser node, we have to average over their returns like: v~(I) Sum_a=0_N (v~(I|a) p(a) * ||A(I)|| / N). """

is there any reference for it?

thanks a lot

EricSteinberger / Deep-CFR

Why mean over all actions sampled in multi outcome sampling #7