Oneflow-Inc / oneflow

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
http://www.oneflow.org
Apache License 2.0
5.84k stars 659 forks source link

[Feature Request]: Oneflow.distributions #10122

Open kxzxvbk opened 1 year ago

kxzxvbk commented 1 year ago

Background and motivation

Hi, thanks for your work. But when I'm tring to migrate my PyTorch code to Oneflow code, I find that there are only few APIs in oneflow.distributions. So this part is very hard for me to deal with. Could you please add more features in this part or give me some advices on how to migrate this part? Thanks for your attention :)

API Proposal

For example, distributions in PyTorch style Normal ...

class torch.distributions.normal.Normal(loc, scale, validate_args=None)
  pass

API Usage

I think this API will be very useful in the area of Reinforcement Learning.

Alternatives

No response

Risks

No response

jackalcooper commented 1 year ago

Sure, we will look into it. It would also help if you could post a more complement example so that we could introduce it as a regression test.

kxzxvbk commented 1 year ago

Thanks for your attention. I hope this code can help you for testing :)

class StochasticDuelingHead(nn.Module):
    """
        Overview:
            The ``Stochastic Dueling Network`` proposed in paper ACER (arxiv 1611.01224). \
            Dueling network architecture in continuous action space. \
            Input is a (:obj:`torch.Tensor`) of shape ``(B, N)`` and returns a (:obj:`Dict`) containing \
            outputs ``q_value`` and ``v_value``.
        Interfaces:
            ``__init__``, ``forward``.
    """

    def __init__(
        self,
        hidden_size: int,
        action_shape: int,
        layer_num: int = 1,
        a_layer_num: Optional[int] = None,
        v_layer_num: Optional[int] = None,
        activation: Optional[nn.Module] = nn.ReLU(),
        norm_type: Optional[str] = None,
        noise: Optional[bool] = False,
        last_tanh: Optional[bool] = True,
    ) -> None:
        """
        Overview:
             Init the ``Stochastic DuelingHead`` layers according to the provided arguments.
        Arguments:
            - hidden_size (:obj:`int`): The ``hidden_size`` of the MLP connected to ``StochasticDuelingHead``.
            - action_shape (:obj:`int`): The number of continuous action shape, usually integer value.
            - layer_num (:obj:`int`): The number of default layers used in the network to compute action and value \
                output.
            - a_layer_num (:obj:`int`): The number of layers used in the network to compute action output. Default is \
                ``layer_num``.
            - v_layer_num (:obj:`int`): The number of layers used in the network to compute value output. Default is \
                ``layer_num``.
            - activation (:obj:`nn.Module`): The type of activation function to use in MLP. \
                If ``None``, then default set activation to ``nn.ReLU()``. Default ``None``.
            - norm_type (:obj:`str`): The type of normalization to use. See ``ding.torch_utils.network.fc_block`` \
                for more details. Default ``None``.
            - noise (:obj:`bool`): Whether use ``NoiseLinearLayer`` as ``layer_fn`` in Q networks' MLP. \
                Default ``False``.
            - last_tanh (:obj:`bool`): If ``True`` Apply ``tanh`` to actions. Default ``True``.
        """
        super(StochasticDuelingHead, self).__init__()
        if a_layer_num is None:
            a_layer_num = layer_num
        if v_layer_num is None:
            v_layer_num = layer_num
        layer = NoiseLinearLayer if noise else nn.Linear
        block = noise_block if noise else fc_block
        self.A = nn.Sequential(
            MLP(
                hidden_size + action_shape,
                hidden_size,
                hidden_size,
                a_layer_num,
                layer_fn=layer,
                activation=activation,
                norm_type=norm_type
            ), block(hidden_size, 1)
        )
        self.V = nn.Sequential(
            MLP(
                hidden_size,
                hidden_size,
                hidden_size,
                v_layer_num,
                layer_fn=layer,
                activation=activation,
                norm_type=norm_type
            ), block(hidden_size, 1)
        )
        if last_tanh:
            self.tanh = nn.Tanh()
        else:
            self.tanh = None

    def forward(
            self,
            s: torch.Tensor,
            a: torch.Tensor,
            mu: torch.Tensor,
            sigma: torch.Tensor,
            sample_size: int = 10,
    ) -> Dict[str, torch.Tensor]:
        """
        Overview:
            Use encoded embedding tensor to run MLP with ``StochasticDuelingHead`` and return the prediction dictionary.
        Arguments:
            - s (:obj:`torch.Tensor`): Tensor containing input embedding.
            - a (:obj:`torch.Tensor`): The original continuous behaviour action.
            - mu (:obj:`torch.Tensor`): The ``mu`` gaussian reparameterization output of actor head at current \
                timestep.
            - sigma (:obj:`torch.Tensor`): The ``sigma`` gaussian reparameterization output of actor head at \
                current timestep.
            - sample_size (:obj:`int`): The number of samples for continuous action when computing the Q value.
        Returns:
            - outputs (:obj:`Dict`): Dict containing keywords \
                ``q_value`` (:obj:`torch.Tensor`) and ``v_value`` (:obj:`torch.Tensor`).
        Shapes:
            - s: :math:`(B, N)`, where ``B = batch_size`` and ``N = hidden_size``.
            - a: :math:`(B, A)`, where ``A = action_size``.
            - mu: :math:`(B, A)`.
            - sigma: :math:`(B, A)`.
            - q_value: :math:`(B, 1)`.
            - v_value: :math:`(B, 1)`.
        """

        batch_size = s.shape[0]  # batch_size or batch_size * T
        hidden_size = s.shape[1]
        action_size = a.shape[1]
        state_cat_action = torch.cat((s, a), dim=1)  # size (B, action_size + state_size)
        a_value = self.A(state_cat_action)  # size (B, 1)
        v_value = self.V(s)  # size (B, 1)
        # size (B, sample_size, hidden_size)
        expand_s = (torch.unsqueeze(s, 1)).expand((batch_size, sample_size, hidden_size))

        # in case for gradient back propagation
        dist = Independent(Normal(mu, sigma), 1)
        action_sample = dist.rsample(sample_shape=(sample_size, ))
        if self.tanh:
            action_sample = self.tanh(action_sample)
        # (sample_size, B, action_size)->(B, sample_size, action_size)
        action_sample = action_sample.permute(1, 0, 2)

        # size (B, sample_size, action_size + hidden_size)
        state_cat_action_sample = torch.cat((expand_s, action_sample), dim=-1)
        a_val_sample = self.A(state_cat_action_sample)  # size (B, sample_size, 1)
        q_value = v_value + a_value - a_val_sample.mean(dim=1)  # size (B, 1)

        return {'q_value': q_value, 'v_value': v_value}
kxzxvbk commented 1 year ago

Any new progress about this issue?