When do critic and actor updates take place?

Improbable-AI / pql

Parallel Q-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation

MIT License

57 stars 3 forks source link

Hi,

In config file pql/cfg/algo/pql_algo.yaml, there is one config called critic_sample_ratio. This critic_sample_ratio = 8 corresponds to beta_a_v = 1:8, which means, within a unit time, for every environment step we update the critic 8 times.

To achieve the above ratio, we want the wall-clock time of every critic update : wall-clock time of every data collection = 1:8, where the wall-clock time of every critic update = critic_unit_time = (time interval / number of critic updates within the interval). Time interval is computed by time.time() - counter[0]['time'] and number of critic updates within the interval is computed by (critic_update_times - counter[0]['critic']). Similar for wall-clock time of every data collection and wall-clock time of every policy update.

Yes, every iteration step/environment step, all 128 envs will be executed.

Improbable-AI / pql

When do critic and actor updates take place? #1