Problem in generation of new trajectories using learned models (TPGMMRPCtrl)

AdrianPrados commented 1 year ago

I am making a comparison between learning using TPGMMBi and TPGMMRPCtrl as shown in Figure 3 of Birp's paper. I am using the data provided in the Rofunc example (example_tpgmmbi_RPCtrl.py) but I am having problems with the process of generating new trajectories using a model that has already been learned.

The learning process does correctly in both cases but when I call Representation.generate() with new data for the Task Parameterized the TPGMMRPCtrl is unable to function correctly. First of all I have made a modification because as the code was initially, the rest of the demonstrations always came out. This was because the data of lqr.x0_l, lqr.x0_r and lqr.x0_l inside of the function _bi_reproduce() were always filled by the values of self.repr_l.demos_xdx[show_demo_idx][0], self.repr_r.demos_xdx[show_demo_idx][0] and lself. repr_c.demos_xdx[show_demo_idx][0] respectively which made it use only the demos data as initial data (as it happens when using Repr.reproduce()). To solve this when calling reproduce I used the data stored in the task parameters as shown in the picture.

The value of lqr.x0_c has been also modified to use the self.repr_c.task_params["frame_origins"][show_demo_idx] value (IMPORTANT: here i dont need to use the [0] at the end due to the form of generate the value, but the structure is the same that in the other task values). This value has been generated using the code in the following image (lines 491-493), as it was seen that by default self.repr_c.demos_xdx[show_demo_idx][0] was used, which generated incorrect values as it uses the relationship of the demo data which changes when we want to use other parameters, therefore what is shown in the image generates results of the relationship between r and l depending on the task parameters data used in Repr.generate().

By making these modifications, it has been achieved that the results of the initial data (which are stored in the values of X0) are taken into account by the new trajectories using the learned models, but the end points of the tasks are not taken into account. My question is whether this is an incorrect step in applying RoFunc or whether it is an internal bug that has to be solved. I attach visual results as well as the values of the trajectories where it can be seen that the initial point is taken into account but the final point is not.

Data numerically compared for the left trajectory: The result shows that the initial point is correctly taked into account but the final point no:

Data generated with the initial demos: Thst is generated by Repr.reproduce() and it is correct.

Data generated with new task parameters: Thst is generated by Repr.generate() and it is incorrect.

Finally, the part of the code where i have done the called for the Repr.generate()

Thank you very much for your help and sorry for the very long issue.

AdrianPrados commented 1 year ago

I forgot to mention that the get_rel_generation function is based on the one used to generate the value of c with the demo data but modified to be used with the task parameters. I attach an image of this function for clarity.

Skylark0924 commented 1 year ago

I will check this. Btw, have you tried the latest update I mentioned in https://github.com/Skylark0924/Rofunc/issues/93 ?

AdrianPrados commented 1 year ago

Thank you very much. Yeah i tried your last update in #93 , everything in the simulation and RL works fine now.

Skylark0924 commented 8 months ago

Hi @AdrianPrados, very sorry for the super late response. I'm working on the RofuncRL subpackage these months and have no time to pay attention to the TP-GMM part. Recently, one project in our lab has been using this technique and let me notice this bug. It was caused by a very stupid typo in the basic class TPGMM, you can find the detail in this commit https://github.com/Skylark0924/Rofunc/commit/6b16e7e8a99bbb3224ca85a94c5b03076990621b .

I also added the generation part in the example. Here are the results: Demo

Repreduction

Generation in new situation

End_pose: [4, 4]

End_pose: [5, 4]

Skylark0924 / Rofunc

Problem in generation of new trajectories using learned models (TPGMMRPCtrl) #97