Using RoboGen as a Benchmark for Evaluating Embodied-AI Models

Kira0096 commented 10 months ago

I am very interested in your project and am considering using RoboGen as a benchmark for evaluating Embodied-AI models. However, I find that I may need some additional features to achieve this goal.

Specifically, the following two features may be needed:

New task definition interface: I hope to be able to input task descriptions and generate corresponding scenes, items, and conditions for task success. This would allow me to flexibly define and test various different tasks.

Generic robot model inference interface: I hope to be able to input task descriptions and historical trajectories, and then output robot signals. This would allow me to test and compare the performance of different robot models.

yufeiwang63 commented 10 months ago

Thanks for your interest in using RoboGen to build a benchmark!

For New task definition interface: I have added a file that sort of does what you need: given a text description, a core articulation object, RoboGen will automatically builds the scene and training supervisions for the text description. I think this can be used as a basis for what you are looking for. See the updated readme here for how to use it.

For Generic robot model inference interface: at this point I think this feature is beyond the current version and scope of RoboGen. I am not sure if I will have time to implement this in the near future due to time constraints. Based on your description, it sounds like a language-conditioned sequence-model policy, like (a language conditioned version) of decision transformer, or Perceiver-Actor. This seems to be a standalone implementation/project on top of RoboGen. Maybe it would be helpful to look into their implementations and see if that would be easily integrated into RoboGen.

If you would have time to implement it, I would be happy to accept a PR and merge it with RoboGen.

Let me know if this helps!

Kira0096 commented 10 months ago

Thanks for your interest in using RoboGen to build a benchmark!

For New task definition interface: I have added a file that sort of does what you need: given a text description, a core articulation object, RoboGen will automatically builds the scene and training supervisions for the text description. I think this can be used as a basis for what you are looking for. See the updated readme here for how to use it.

For Generic robot model inference interface: at this point I think this feature is beyond the current version and scope of RoboGen. I am not sure if I will have time to implement this in the near future due to time constraints. Based on your description, it sounds like a language-conditioned sequence-model policy, like (a language conditioned version) of decision transformer, or Perceiver-Actor. This seems to be a standalone implementation/project on top of RoboGen. Maybe it would be helpful to look into their implementations and see if that would be easily integrated into RoboGen.

If you would have time to implement it, I would be happy to accept a PR and merge it with RoboGen.

Let me know if this helps!

Thank you for your prompt response! I plan to try the first solution, as I believe it will meet my needs.

As for the second question, could you provide me with some guidelines to start with? For instance, some line IDs in the code files to help me understand how to deploy pre-trained models in the environment?

Genesis-Embodied-AI / RoboGen

Using RoboGen as a Benchmark for Evaluating Embodied-AI Models #6