Closed SshiJwu closed 7 months ago
Hi! Yes, these tasks mostly do not support all vision modalities for learning originally.
Therefore, we make these environments support all these modalities, including depth, image, and point cloud. In our paper, the observations used for policy learning are all visual observations.
Thanks for your reply.
I would like to ask, besides performing FPS and Crop on point clouds, do we also need to normalize them? How much impact does the distance of the camera from the scene have on the algorithm.
Hello! Thanks for your greatful work! I'd like to consult you about some details of the experiment. You trained diffusion policy on Bi-DexHands or DexArt, respectively. As far as I can tell, these tasks don't seem to provide image information. I'd like to ask if your diffusion policy uses a state-based network and what is the observation of the diffusion policy.