Armandpl / furuta

Building and Training a Rotary Inverted Pendulum robot
15 stars 4 forks source link

sanity check by training with sac or tqc #64

Open Armandpl opened 6 months ago

Armandpl commented 6 months ago

I want to train the robot in very few steps and very quickly in terms of wall time but I haven't completed a training run on the robot yet. I should do that first to sanity check, make sure there is nothing wrong with the robot, the laptop/robot coms or the env code.

repro the training run from last time:

Armandpl commented 6 months ago
Armandpl commented 6 months ago

The robot arm broke so I can't secure it to the motor shaft anymore. The motor also start making weird noise when the action oscillates too much, sounds like gears are slipping. I disassembled the gearbox and it seems ok. Maybe there is a bit of play and the vibrations are making a gear pop out of place??

Armandpl commented 6 months ago

tqc action in sbx: Screenshot 2024-01-25 at 17 59 11 sac action in sb3: Screenshot 2024-01-25 at 18 06 54 I feel like sde isn't working with tqc? check in the code, maybe open issue in the repo to ask the question?

Armandpl commented 6 months ago

Try sac w/ gSDE in sbx?

Armandpl commented 6 months ago

Ok so I trained for 1000 steps using tqc in sbx with gsde on/off and episodic training or not and looked at the action over 100 steps. gsde=True, train_freq=(1, "episode"): tqc_episodic_gsde_sbx gsde True, train_freq=1: tqc_train_freq_1_gsde_sbx gsde True, train_freq=100: tqc_train_freq_100_gsde_sbx gsde=False, train_freq=(1, "episode"): tqc_episodic_no_gsde_sbx

Episodic it is then? Or maybe when the policy converges the action gets less noisy and its fine?

Armandpl commented 6 months ago

Ok so now we could bench TQC against SAC to gauge which one we should use on the real robot? But if we go this route we should also probably tune the hyper-parameters? but is the tuning going to transfer since it is still unclear how far/close the sim is to the actual robot? Still worth trying I guess

Maybe just go to hp tuning for tqc since 'we know' it is better?

Armandpl commented 6 months ago

In simulation, I can get the agent to converge in ~40k timesteps. 40k timesteps at 50Hz is 15 min in real life. But when training on the real robot it takes hours. It is slow in part because waiting for the pendulum to reset is slow. Maybe we shouldn't reset the robot and let it learn for a long time??

Armandpl commented 6 months ago

Need to update the way we reset the episode. I set up a PID but it is badly tuned and I think it may have damaged the motor.