How to set constant_zx?

Unified-Language-Model-Alignment / src

Apache License 2.0

14 stars 2 forks source link

Hi, sorry for the last response. In the appendix A of our paper, we follow [1] to show that Z(x)≈ 1, and therefore the log Z(x) terms can be treated as 0 in practice. Meanwhile, to test the effect of various log Z(x), the config --ulma_constant_zx can be used. Actually we are considering adding a new ablation study on testing how Z(x) affects the method's performance, and hopefully the results will be updated to the Arxiv in the coming few weeks. Hope that helps!

[1] Banghua Zhu, Hiteshi Sharma, Felipe Vieira Frujeri, Shi Dong, Chenguang Zhu, Michael I Jordan, and Jiantao Jiao. Fine-tuning language models with advantage-induced policy alignment.

Unified-Language-Model-Alignment / src

How to set constant_zx? #4