allenai / RL4LMs

A modular RL library to fine-tune language models to human preferences
https://rl4lms.apps.allenai.org/
Apache License 2.0
2.13k stars 191 forks source link

Is the construction of _value_model necessary? #27

Closed xesdiny closed 1 year ago

xesdiny commented 1 year ago

Why do you need to define _value_model in the policy, I think you can use _ref_model plus _value_head to get the value, at least 1/3 of the parameters and backward gradient overhead are reduced in the GPU memory.

    def _build_model_heads(self,
                           model_name: str):
        self._policy_model = AutoModelForCausalLM.from_pretrained(
            model_name)
        self._policy_model.__class__ = override_generation_routines(
            type(self._policy_model))

        self._value_model = AutoModelForCausalLM.from_pretrained(
            model_name)
        self._ref_model = deepcopy(self._policy_model).eval()

        self._value_head = nn.Linear(
            self._value_model.config.hidden_size, 1, bias=False)

        # apply model parallel
        ...
        self._value_head = self._value_head.to(self.device)
rajcscw commented 1 year ago

Hey we went ahead with different policy and value networks for no particular reason. Ofcourse, there could be memory optimizations with shared policy and value networks. Feel free to adapt the policy implementation for your usecase. Also, be reminded that self._ref_model is kept constant so attaching _value_head to it does not make sense.

xesdiny commented 1 year ago

"Of course, there could be memory optimizations with shared policy and value networks." Yeah,I just need to connect a value_ head (MLP) to the policy model instead of the ref model.