facebookresearch / Pearl

A Production-ready Reinforcement Learning AI Agent Library brought by the Applied Reinforcement Learning team at Meta.
MIT License
2.6k stars 157 forks source link

What is the use of the attribute 'state' in the RecEnv from the single item recommender tutorial? #97

Closed davera-017 closed 3 months ago

davera-017 commented 3 months ago

The environment class in this tutorial has an attribute called state which tracks the user history but apparently is never used. Here is its initialization code:

class RecEnv(Environment):
    def __init__(
        self, actions: List[torch.Tensor], model: nn.Module, history_length: int
    ) -> None:
        ...
        self.state: torch.Tensor = torch.zeros((self.history_length, 100)).to(device)
        ...

I was wondering if this is somehow used inside other classes (e.g. some History Summarization Module) or if it is not used at all.

rodrigodesalvobraz commented 3 months ago

Hi Daniel, Why do you say it is never used? When I search for self.state, I see: image The state is initialized with zeros. It is used to form state_batch, which is used to determine the reward in

reward = self.model(state_batch, action_batch) * 3

It is also gradually updated, upon each step, with the most recent action. After history_length steps, the zeros will have been overwritten and self.state it will be a record of the history_length last taken actions. This is the information used by self.model to compute the reward. I hope that helps. Rodrigo