Closed Xianqi-Zhang closed 1 year ago
Yes, the current reward function uses ground truth states from simulation. If you use observations to estimate the state, then the reward function will inherit the noise from estimation, and will be worse. Therefore you should try to minimize estimation error if you go this route.
I am not sure what you mean by "same blocks can be replaced". Can you give an example?
In order for 2 parts to be connected, they must meet a distance and angular accuracy constraint (which you can tune). So even though we just count the number of connected parts here, the connected parts already satisfied the distance / angular accuracy constraints.
- Yes, the current reward function uses ground truth states from simulation. If you use observations to estimate the state, then the reward function will inherit the noise from estimation, and will be worse. Therefore you should try to minimize estimation error if you go this route.
- I am not sure what you mean by "same blocks can be replaced". Can you give an example?
- In order for 2 parts to be connected, they must meet a distance and angular accuracy constraint (which you can tune). So even though we just count the number of connected parts here, the connected parts already satisfied the distance / angular accuracy constraints.
Thank for your reply.
"same blocks can be replaced": e.g., a table (with 4 sites (s-A, s-B, s-C, s-D)) has 4 legs of the same shape (l-A, l-B, l-C, l-D), then these legs should be interchangeable, i.e., {[l-A, s-A], [l-B, s-B]} should be equal to {[l-A, s-B], [l-B, s-A]}. But the 'recipe' and 'site_recipe' are fixed in recipe files. It seems that the interchangeability of objects of the same shape is not considered in the code. Will calculating the reward cause an error?
I'm very sorry, my previous description may not be clear. I mean that whether a furniture is successful assembled is determined by comparing (a) self._subtask_step == self._success_num_conn (furniture_sawyer_dense.py) (b) self._num_connected == self._success_num_conn (furniture.py)
In (a), the agent must connect pairs which are predefined in recipes, which means {[l-A, s-A], [l-B, s-B]} maybe right, but {[l-A, s-B], [l-B, s-A]} is wrong, although maybe they are actually equal.
In (b), if the agent connects all parts, the final combined result is likely to be wrong as well. For example, when you assemble a Lego toy and attach the arms to the head and the head to the legs, even though it is wrong, all the parts are still connected together.
Best regards.
But in reality, there is a many to many mapping between parts - any of the legs can attach to any of the holes, and these are all functionally correct. To implement this, the labeling scheme for the XML and the python code must be updated, but we did not do this to keep things simple.
Thanks for your kindly reply.
I'm sorry that the description below may be a little offensive, but what confuses me is (1) Dense reward needs recipes (simplified ground truth), and different models need to be trained for different furniture. (2) However, the FurnitureSawyerGenEnv (furniture_sawyer_gen.py) can assemble a lot of furniture with recipies. (Better generalization and better performance.) Since this problem can be solved with a hand-designed program, why should we use DL-based methods (IL or RL) to solve it repeatedly?
furniture.py try_connect() check connectors of two parts. I want to known if there are some furniture have more than one final assembled shape. For example, if a chair has a backrest and four legs. The seat part has 2 connection points on the upper side (for the backrest) and 4 connection points on the lower side (for the legs). But the final wrong shape may be that the upper side connects two legs, and the lower side connects the backrest and the other two legs. I would like to know if the above situation is possible to judge whether the current method of judging assembly success is reasonable. Since the environment contains more than 60 furniture, it is very difficult to check them one by one. Does every furniture have only one final shape after assembling all its parts?
Thanks for any reply.
Best regards.
No problem, these are good questions.
Even though there may exist an hand designed optimal controller for the task, there are still practical and scientific benefits towards investigating learning based controllers.
First, the optimal controller, while it exists, may have several practical constraints. It may require a lot of effort to acquire (e.g. coming up with equations and deriving optimal control), may be brittle (if furniture changes, will need to rederive the optimal control), and may be computationally expensive (sampling based motion planner may need a lot of samples to work).
In contrast, the learning based controllers offer some advantages. The effort in acquiring a good controller is now shifted from human engineering to automated search (although I concur that RL / IL still requires human engineering, e.g. reward function). Learning based methods can generalize and adapt to distribution shift. Finally, learning based methods can be computationally efficient since all the computation is now distilled into a forward pass of a neural network. Note that there is also a rich line of work on integrating hand-designed controllers with RL / IL, so it's not like you have to choose between both.
Finally, let's look at tasks in RL in general. Many of the mujoco tasks can be solved with random search, or are trivially solved by optimal control (cartpole). So why do we still benchmark RL on them? Here, the value is for understanding RL in a scientific manner - if we know the properties of the task and how to solve it, we can better diagnose and improve RL when it fails in these toy tasks. In furniture assembly, we are benchmarking long-horizon tasks for RL.
I think there is some confusion. For the furnitures with dense reward, we assign them with 1:1 mappings, e.g. table leg 1 should connect to slot 1. So there is only 1 final shape.
But if you use the sparse reward env for any furniture, there are multiple final shapes allowed (e.g. table leg 1 -> slot 1, table leg 2 -> slot 1) are all allowed. The way we handle this logic is through the XML labeling scheme, please refer to the paper for details.
Thank you for very much.
Best regards.
Hi, Thanks for your sharing. I have 3 questions:
The dense reward function needs to get recipes (seems to be simplified ground truth) to calculate reward, does this means that compared with other methods of calculating reward (through images, electric clouds, poses, etc.), it has more limitations? It seems that only furniture with the recipe can use dense reward function?
The recipe fixes how each part is assembled and does not take into account the fact that the same blocks can be replaced. Will calculating the reward cause an error?
Thanks for any reply.
Best regards.