clvrai / skill-chaining

Adversarial Skill Chaining for Long-Horizon Robot Manipulation via Terminal State Regularization (CoRL 2021)
https://clvrai.com/skill-chaining
28 stars 4 forks source link

Assembly succeeds but no success.pkl file generated #3

Open feup-jmc opened 2 years ago

feup-jmc commented 2 years ago

Since #1 (which I've closed as the overall impression I got was that I should have researched the matter myself more extensively), I've managed to obtain successful states for assembly of the first table leg (using GAIL). My theory as to why this didn't work before is that the training program doesn't like to be interrupted from warmup until reaching a success state, even if it has checkpoints to pick up from. I've also noticed that for some reason, after a few success episodes, GAIL seems to 'forget' and can't succeed again (see below figure, in which the large reward spikes match up with successful episodes).

rewards_sawyer_gail_1leg

But onto the main focus of this issue: Even after obtaining successful episodes, in which the checkpoint video proves there is successful assembly of the table leg, there is no success_(...).pkl file to use as a starting point for the next leg. This is strange as having https://github.com/youngwoon/robot-learning/blob/11bc2ac1b89a0f2e772bd092a87ec2415a785617/robot_learning/trainer.py#L370-L372 classify the video as associated to a successful episode should also mean the same thing for https://github.com/youngwoon/robot-learning/blob/11bc2ac1b89a0f2e772bd092a87ec2415a785617/robot_learning/trainer.py#L195-L197 Since this codebase is quite dense, I can't quite figure out what is the difference/relation between the values in info and info.keys() and hence, how to generate the file

feup-jmc commented 2 years ago

I feel somewhat stupid but I figured it out myself. The validation step is necessary for the success files to be generated.

feup-jmc commented 2 years ago

Hello, and once again thanks for the great work put into this repo. Reopening the issue with some new insights:

I have found out that my early victory in success file generation turned out to be due to using a low amount of evaluations. I previously used 10-50, which worked for the 1st leg and the 2nd, but had its fair share of issues - some weird behaviour, not validating the best episodes as successes, as well as providing low quality start states for further legs.

However, it turns out that using --num_eval 200 as indicated in section 3 (presumabily to match the number of demos) makes this a problem - with the exception of the 1st leg, the increased scrutiny ensures checkpoints do not get verified as successes despite video evidence proving otherwise. What is the cause of this problem? The rewards for the 2nd leg successes are double those in the the 1st leg's successes, so it should have no problem validating the checkpoints properly.

Screenshot_4

feup-jmc commented 2 years ago

Will close #6 and #7 for now as they are almost certainly due to this specific issue, with terminal/initial state quality being decisive.

feup-jmc commented 2 years ago

After a bit of digging deeper, this situation is even more baffling:

The function sequence is run.py -> trainer.py -> evaluate(params) -> _evaluate(params), with the latter evaluating the episode num_eval times, and aggregating the information in info; evaluate should write out the success find if "episode_success_state" in info.keys. Therefore, the expectation would be that a higher value in num_eval would mean a greater likelihood of a successful episode generating a success.pkl file.

Is there an issue with the Info class from infodict.py? I would think not as it has worked fine the remaining times. What is going wrong here? I really can't tell. I would appreciate if you could provide some insight on the issue.