Use pearl with custom pytorch model

BoccheseGiacomo commented 7 months ago

Used to use pytorch + gym for RL. Is there a way to make a custom neural net in pytorch and insert it into the pearl agent without using the pre-made settings for neural nets?

rodrigodesalvobraz commented 7 months ago

It is possible to directly provide a network instance to be used by policy learners implementing Temporal Difference (TD learning) methods:

All direct and indirect subclasses of DeepTDLearning (DeepQLearning, DeepSARSA, DoubleDQN, among others), since DeepTDLearning accepts an initialization parameter network_instance, which must be an instance of QValueNetwork. While network_instance does not explicitly appear in the documentation of the __init__ methods of the subclasses, the parameter can be passed to them as a keyword argument (kwargs) and will be used by their super class. Here's an example of network_instance being used for DeepQLearning.
All direct and indirect classes of QuantileRegressionDeepTDLearning (currently, this is only QuantileRegressionDeepQLearning). This class also accepts a network_instance parameter, but this time it must be an instance of QuantileQValueNetwork. There is currently no example of this in the code base because the test for QuantileRegressionDeepQLearning uses the default network type, but its use is analogous to the DeepQLearning example linked above.

Moreover, while it is not possible to directly provide network instances to the subclasses of ActorCriticBase (such as DeepDeterministicPolicyGradient, ImplicitQLearning, ProximalPolicyOptimization, REINFORCE, SoftActorCritic, and ContinuousSoftActorCritic), it is possible to use custom user network classes. These policy learners accept actor_network_type and critic_network_type parameters which allow the user to specify the classes to be used for those networks. So the user can define their own custom network classes and have the policy learner use them. They must however be subclasses of ActorNetwork and QValueNetwork classes respectively.

I hope this helps, please let us know if something is unclear.

BoccheseGiacomo commented 7 months ago

@rodrigodesalvobraz

Thank you. it's clear.

Another small question: can i add custom properties to my agent? for example if i want my agent to have some properties that are properties of the agent, not of the environment, for example "money". Imagine i'm running a simulation of a market economy environment, and i want each agent to possess some float attributes like money, health, hunger etc.

Can i do this with pearl? In gym i would simply insert these properties in my "agent" class

rodrigodesalvobraz commented 7 months ago

Adding a property to an object is always possible in Python in general, regardless of the libraries one might be using. If you have a object instance agent, writing agent.money = 10 will work and be accessible from then on.

However, I suspect this might not be what you would really want. If you want the RL algorithm to take these properties into account when making decisions for the agent (for example, the agent will be more likely to spend money if it has a lot of it, but will try to save when it does not), then what you really want is to include that information in the observations being received by the agent (if you happen to be using an environment, then the environment is a good candidate to keep this information and include it in the observations it provides). This is because RL algorithms will make decisions based on the information contained in observations (states).

It might be a bit odd to think that an agent's property is being included in the observations coming from the outside world since we tend to think of it as an internal property, but one way to think about it is that the agent is observing things, including how much money is in its pocket!

Because Pearl currently represents observations as tensors, you might want to use a representation such that the agent's properties of interest are concatenated with the rest of the observation into a single tensor.

To keep the property up-to-date (for example, to decrease the agent's money once a decision is made to spend some), the code receiving the action from the agent (in learning situations this will typically be the step method of the environment) must modify the property's value according to the action taken.

I hope this helps.

BoccheseGiacomo commented 7 months ago

@rodrigodesalvobraz

thank you for the super clear explanation. i will do through the environment in case. Yes i know about how observations work in RL, but i thought that each agent needs to concatenate its own internal property to the external observation, and this property is different for every agent.

So i need to make a list/dictionary that saves the internal property for each agent.

thanks.

rodrigodesalvobraz commented 7 months ago

The solution you mention (keeping the properties inside the agent instance and concatenating them to the external observation) might actually work, too.

It seems to be a little less conventional. Usually in an RL problem one talks about a single type of observation, but here you have two types, the external one and the internal one obtained from concatenating agent's properties to the first one. Software libraries often makes assumptions based on conventions, so going a less conventional route may turn out tricky, but as far as I can tell right now your approach could work too.

BoccheseGiacomo commented 7 months ago

thank you again

facebookresearch / Pearl

Use pearl with custom pytorch model #60