Difficulty in training increasing number of subtasks

Good morning,

I have observed, following the steps in sections 2 and 3 (in the README), that while training for the 1st leg is generally quite successful and uneventful, it seems training more table legs (using previous task success states as init states for the current subtask) becomes increasingly difficult, with demos being ruled out as a cause of problems.

Whereas the 1st leg finished in 1 run (each run ~50M steps) and the 2nd in 2, the 3rd has failed even after 5 attempts. It also seems that in the case of the 2nd leg, despite many videos proving an abundance of successes, the evaluator (section 3) returns only a few successes, corresponding to different steps from those indicated by the video title. This issue picks up on some points of #3, but the main idea is that I suspect the evaluator is at fault somehow, with the negative effects cascading down and becoming more noticeable with the increased number of subtasks.

Is it natural for this behaviour to emerge in training or is there something wrong on my side? And what could it be?

clvrai / skill-chaining

Difficulty in training increasing number of subtasks #6