didclab / RL-Optimizer

The RL optimization work by Jamil, Elvis, and Jacob in DIDCLAB
0 stars 2 forks source link

Runner-Pool #11

Open elrodrigues opened 10 months ago

elrodrigues commented 10 months ago

Build a Trainer/Runner pool (where each runner would have 1 environment and 1 job associated) for parallel training.

elrodrigues commented 10 months ago

The objective of this issue has changed because of design of the middleware. Pool is probably the wrong word to use here since I'm talking about lazily spinning up Trainers/Runners when a new job is added and strapping a manager to these runners, but it's the closest word I have in my vocabulary to describe this.

This is a certified schizo moment.

elrodrigues commented 10 months ago

The runners/trainers will have a 'hook' to sync their models to the manager's master model every couple episodes. The manager will also periodically 'down'-sync its master-model to its trainers.

I haven't yet decided Taus for the up-sync/hook and down-sync. These will be set in config.

elrodrigues commented 10 months ago

This has changed a little now. The trainer's 'hook' is no longer an up-sync but instead a down-sync. The up-sync is handled internally by the trainer after its master model is set by the manager.

I imagine trainers implementing their form of soft_update_agent(local, target). For BDQTrainer for example, this function would contain something along the lines of:

BDQAgent.soft_update(self.pre_net, self.pre_target, self.tau)
BDQAgent.soft_update(self.state_net, self.state_target, self.tau)
for i in range(self.num_actions):
    BDQAgent.soft_update(self.adv_targets[i], self.adv_nets[i], self.tau)