DepthModality clips depth after getting the real depth distances

ARISE-Initiative / robomimic

robomimic: A Modular Framework for Robot Learning from Demonstration

MIT License

592 stars 181 forks source link

DepthModality clips depth after getting the real depth distances #150

Closed jypark0 closed 5 months ago

jypark0 commented 5 months ago

Hi, I ran into an issue where getting depth maps have values are clipped. After some digging, it seems that the process_frame function for DepthModality clips depth map values to be within [0,1]. https://github.com/ARISE-Initiative/robomimic/blob/5dee58f9cc1235010d0877142b54d0e82dd23986/robomimic/utils/obs_utils.py#L921

However, in the env_robosuite wrapper, the depth is processed after getting the real depth values. https://github.com/ARISE-Initiative/robomimic/blob/5dee58f9cc1235010d0877142b54d0e82dd23986/robomimic/envs/env_robosuite.py#L223-L225

I can think of two possible fixes:

Changing the process_frame arguments to not scale and clip by setting scale=None
Changing the order of operations in env_robosuite.py by processing obs first and then getting the real depth map.

Thanks!

amandlek commented 5 months ago

I don't believe this should be an issue if things have been setup correctly. This function here, which runs when retrieving environment metadata from the dataset, should tell the obs processor not to clip depth values. Specifically, here.

Can you check whether those functions are running in your setup?

jypark0 commented 5 months ago

Sorry for the late reply. I was able to debug the code and it does seem like that function EnvUtils.set_env_specific_obs_processing is being called for the cameras contained inside the dataset metadata. I wanted to extract obs from cameras that were not specified in the dataset (such as frontview, birdview, etc.) and so the process_frame for those depth modalities weren't being set. Does the script dataset_states_to_obs.py set the depth process_frame correctly for all cameras?

amandlek commented 5 months ago

Registration of observation keys should happen here, when constructing the environment. The specific depth modalities would be registered here. That should ensure that the same process_frame function is used for all of those cameras. Is that not what's happening on your end?

jypark0 commented 5 months ago

Ok I was able to fix the issue. I was using the functions create_env_from_metadata() and I think there was a bug in my code. After using create_env_for_data_processing(), the problem went away. Thanks for pointing out the other functions! I'll close this issue.