Armandpl / furuta

Building and Training a Rotary Inverted Pendulum robot
16 stars 4 forks source link

try fine-tuning a policy trained in sim #61

Closed Armandpl closed 7 months ago

Armandpl commented 7 months ago

basically we run into the same issue than when we try training from scratch. at warmup, gSDE outputs values that are too small, then, when training the output of the policy + the gSDE noise is always too big and we run into the limits of the system.

Armandpl commented 7 months ago

One thing we could do is manually fill the replay buffer with the trained policy for a number of steps, run a number of updates then start normal training with a low entropy coefficient?