Is the construction of _value_model necessary?

xesdiny commented 1 year ago

Why do you need to define _value_model in the policy, I think you can use _ref_model plus _value_head to get the value, at least 1/3 of the parameters and backward gradient overhead are reduced in the GPU memory.

    def _build_model_heads(self,
                           model_name: str):
        self._policy_model = AutoModelForCausalLM.from_pretrained(
            model_name)
        self._policy_model.__class__ = override_generation_routines(
            type(self._policy_model))

        self._value_model = AutoModelForCausalLM.from_pretrained(
            model_name)
        self._ref_model = deepcopy(self._policy_model).eval()

        self._value_head = nn.Linear(
            self._value_model.config.hidden_size, 1, bias=False)

        # apply model parallel
        ...
        self._value_head = self._value_head.to(self.device)

rajcscw commented 1 year ago

Hey we went ahead with different policy and value networks for no particular reason. Ofcourse, there could be memory optimizations with shared policy and value networks. Feel free to adapt the policy implementation for your usecase. Also, be reminded that self._ref_model is kept constant so attaching _value_head to it does not make sense.

xesdiny commented 1 year ago

"Of course, there could be memory optimizations with shared policy and value networks." Yeah,I just need to connect a value_ head (MLP) to the policy model instead of the ref model.

allenai / RL4LMs

Is the construction of _value_model necessary? #27