allenai / RL4LMs

A modular RL library to fine-tune language models to human preferences
https://rl4lms.apps.allenai.org/
Apache License 2.0
2.13k stars 191 forks source link

UnderStand Mask model to _get_action_masks in LogitsProcessor #31

Closed xesdiny closed 1 year ago

xesdiny commented 1 year ago

In this linecode I saw that the code sets MaskLogitsProcessorCasualLM Init process uses deepcopy(self._policy_model).eval() and during the generate process, GenerationMixinWithRawScores.sampler executes pre-process distribution and Hook calls to the custom LogitsProcessor. I compared next_token_logits_raw from policy_model is indeed different from next_token_logits from mask_model in the same generate pipeline, what is the meaning of doing here? I really want to know?