v2.0 - Githubissues

Armandpl commented 1 year ago

This is a tracking issue for the second iteration of this project.

CAD/Mechanical Assembly:

[x] #29
[x] #22
[x] #30
[x] #31
[ ] finish designing the part to mount the slip ring to the boom arm and add it to the repo

Software:

[x] #25
[x] #24
[x] use hydra to configure pendulum
[x] add tests? robot self test like 3D printer?
[x] separate package for gym wrapper and actual pendulum?
[x] clean up wandb
- [x] ~~delete all models except for the working one (v204)~~ started by deleting replay buffers first
- [x] log robot config to training runs
- [ ] ~~delete failed and killed runs?~~
[ ] #23
[x] better env reset. previously we used hardcoded commands to reset the robot to its starting position. using a pid should be easier, cleaner and should transfer more easily should we have different robot hardware configurations. this should be part of the gym wrapper. the robot class/API should only be about the robot and should be usable outside of the gym context just wait for the pendulum to be below an angle threshold for a number of steps then reset both encoders. this way we don't have to move motor back. drawback is the pendulum isn't facing the same way every episode, but this time we're not making a video so that's alright
[ ] control frequency wrapper: catch up delays
[x] update pre-commit setup

Electronics:

add a current sensor
use more precise encoders for the motor + pendulum and evaluate their impact
use a motor that has helical gear vs. spur gear. see if it reduces play and allows for better/smoother control

RL:

[ ] faster training. previous iteration took 4-5h to train. can we go faster? can we train under <10 min? can we do it without giving more info/constraints.
[x] upgrade gym to make upgrade to gymnasium easier
[ ] offline RL?
- datasets could be from an energy based swing up + pid?
- use MCAP files? or replay buffers from SAC? MCAP feels like the good choice here as the replay buffer contains the transforms applied by the wrappers such as the HistoryWrapper
[ ] measure how much of the time is spent running the policy and how much of the time is spent doing matrix multiplication while the pendulum is idle. is it possible to parallelize?? outsource matrix multiplication to a cloud machine?
[ ] ~~try using torque as the output of the policy (torque control on arduino??)~~
[ ] try training with/without current sensor data in the observation
[ ] remove the two different control frequencies and use a SkipFrame wrapper instead.
[ ] #26

Control Theory:

Documentation:

Validate reproducibility:

Ressources: Offline RL:

RL:

newnew:

Armandpl commented 9 months ago

Goals as of 7 jan: Pierre:

Armand:

broadly make training faster. 5h is too slow given deepmind trains robot dogs in 4 min
faster = wall time
parallelize simulation
- easy first pass: use vec env, paralelize on cpu with the current sim
- try sim2real, maybe fine tune
try other algos
- now that we can use not onboard compute maybe we can use on policy algos e.g PPO
try and remove as much code as possible
- try and remove the velocity filter by using PPO-LSTM and having the agent figure out the filter. see PPO vs RecurrentPPO (aka PPO LSTM) on environments with masked velocity (SB3 Contrib)
depending on sim2real transfer, make sim more accurate
- simulate sensor noise and resolution
- simulate motor
- system id from logs?
Depending on sim speed on cpu + sim2real success
- sim in isaac gym

Armandpl commented 9 months ago

New list of todos:

Armandpl / furuta