This is a tracking issue for the second iteration of this project.
CAD/Mechanical Assembly:
[x] #29
[x] #22
[x] #30
[x] #31
[ ] finish designing the part to mount the slip ring to the boom arm and add it to the repo
Software:
[x] #25
[x] #24
[x] use hydra to configure pendulum
[x] add tests? robot self test like 3D printer?
[x] separate package for gym wrapper and actual pendulum?
[x] clean up wandb
[x] delete all models except for the working one (v204) started by deleting replay buffers first
[x] log robot config to training runs
[ ] delete failed and killed runs?
[ ] #23
[x] better env reset. previously we used hardcoded commands to reset the robot to its starting position. using a pid should be easier, cleaner and should transfer more easily should we have different robot hardware configurations. this should be part of the gym wrapper. the robot class/API should only be about the robot and should be usable outside of the gym context just wait for the pendulum to be below an angle threshold for a number of steps then reset both encoders. this way we don't have to move motor back. drawback is the pendulum isn't facing the same way every episode, but this time we're not making a video so that's alright
[ ] control frequency wrapper: catch up delays
[x] update pre-commit setup
Electronics:
add a current sensor
use more precise encoders for the motor + pendulum and evaluate their impact
use a motor that has helical gear vs. spur gear. see if it reduces play and allows for better/smoother control
RL:
[ ] faster training. previous iteration took 4-5h to train. can we go faster? can we train under <10 min? can we do it without giving more info/constraints.
[x] upgrade gym to make upgrade to gymnasium easier
[ ] offline RL?
datasets could be from an energy based swing up + pid?
use MCAP files? or replay buffers from SAC? MCAP feels like the good choice here as the replay buffer contains the transforms applied by the wrappers such as the HistoryWrapper
[ ] measure how much of the time is spent running the policy and how much of the time is spent doing matrix multiplication while the pendulum is idle. is it possible to parallelize?? outsource matrix multiplication to a cloud machine?
[ ] try using torque as the output of the policy (torque control on arduino??)
[ ] try training with/without current sensor data in the observation
[ ] remove the two different control frequencies and use a SkipFrame wrapper instead.
[ ] #26
Control Theory:
pid? mpc? system id?
Documentation:
[ ] an assembly video tutorial would be nice
[ ] doc should show the angles origings and directions
[ ] changelog?
[ ] atomic/concise wandb reports for RL experiments would be nice
This is a tracking issue for the second iteration of this project.
CAD/Mechanical Assembly:
Software:
delete all models except for the working one (v204)started by deleting replay buffers firstdelete failed and killed runs?better env reset. previously we used hardcoded commands to reset the robot to its starting position. using a pid should be easier, cleaner and should transfer more easily should we have different robot hardware configurations. this should be part of the gym wrapper. the robot class/API should only be about the robot and should be usable outside of the gym contextjust wait for the pendulum to be below an angle threshold for a number of steps then reset both encoders. this way we don't have to move motor back. drawback is the pendulum isn't facing the same way every episode, but this time we're not making a video so that's alrightElectronics:
RL:
try using torque as the output of the policy (torque control on arduino??)Control Theory:
Documentation:
Validate reproducibility:
Ressources: Offline RL:
RL:
newnew: