MuJoco integration - Githubissues

Hi,

I'm using your FOTS version integrated with MuJoCo and want to utilize it for Leap hand manipulation. However, I have encountered the following issues:

Initial Depth Difference: Graph 1 is the optical image generated by the initial depth you provided in ./utils/utils_data/ini_depth_extent.npy, and Graph 2 is the image in the case where no collision occurs between the digit sensor and the object in your example code. According to my understanding, they should be the same. Can you explain the reason for this discrepancy

Graph 1: thumb_ini

Graph 2 : thumb_depth_3

Marker Motion Generation: I used the depth data from the case where no collision occurs to generate marker motion when the object contacts the digit sensor. However, there is no marker motion. Can you explain why this happens and provide a method to generate marker motion between the current frame and the last frame (e.g., 0.5s ago) instead of between the current frame and the initial frame?
Leap Hand Environment: I integrated the digit sensor with the Leap hand and built a simulation environment. However, the initial depth in the simulation (not from ini_depth_extent.npy) is unusual. The optical image generated by the digit sensor in thumb is shown in Graph 3:

Graph 3 thumb_depth

The initial depth looks like this:

   array([[-15789.237, -15790.517, -15791.797, ..., -15790.493, -15789.221,
           -15787.947],
          [-15789.903, -15791.179, -15792.459, ..., -15791.154, -15789.883,
           -15788.61 ],
          [-15790.576, -15791.848, -15793.12 , ..., -15791.819, -15790.544,
           -15789.272],
          ...,
          [-15773.748, -15774.644, -15775.538, ..., -15774.417, -15773.526,
           -15772.638],
          [-15772.997, -15773.893, -15774.788, ..., -15773.562, -15772.672,
           -15771.782],
          [-15772.248, -15773.142, -15774.036, ..., -15772.706, -15771.816,
           -15770.927]], dtype=float32)

and here is my scene Screenshot from 2024-06-13 22-41-01

and this is the .xml thumb digit structure:

 <body name="thumb_fingertip" pos="0 0.0466 0.0002" quat="0 0 0 1">
          <inertial pos="0 0 0" quat="0.704307 0.709299 0.006848 -0.0282727" mass="0.049" diaginertia="2.03882e-05 1.98443e-05 4.32049e-06"/>
          <joint name="j15" pos="0 0 0" axis="0 0 -1" limited="true" range="-1.34 1.88"  class="j15_class"/>

          <!--geom pos="0.0625595 0.0784597 0.0489929" type="mesh" contype="0" conaffinity="0" group="1" density="0" rgba="0.603922 0.14902 0.14902 1" mesh="thumb_fingertip"/-->
          <!--geom size="0.0102991 0.016 0.012" pos="-0.000799057 -0.0461 -0.0143" type="box" rgba="0.603922 0.14902 0.14902 1"/-->           
          <geom size="0.01425 0.01955 0.0165" pos="0.00425 -0.01005 -0.0143" type="box" rgba="0.603922 0.14902 0.14902 1"/>

          <!--Digit sensor-->
          <body name="digit_sensor_thumb" pos="0.020799056 -0.0588163 -0.013" euler="-1.5708 -1.5708 0">
            <site name="digit_0:site" pos="0.012 0.0015 0.035" rgba="1 0 0 .1" size="0.02 0.02 0.02"/> <!--0.012 0 0.035-->

            <!--Front and Back-->
            <geom name="digit_thumb:back"  type="mesh" euler="3.1416 0 1.5708" pos="0.028 0.015 0.036" material="black_resin" mesh="digit_back"  mass="0.05" contype="32" conaffinity="32" friction="1 0.05 0.01" solimp="1.1 1.2 0.001 0.5 2" solref="0.02 1"/>

            <!--Glass Cover-->
            <geom name="digit_thumb:glass" type="mesh" material="transparent_glass"  mesh="digit_glass" mass="0.005" contype="0" conaffinity="0"  pos="0.024 -0.0085 0.017" euler="0 0 1.5708"/>

            <!--Elastomer-->
            <geom name="digit_thumb:elastomer" type="mesh" mesh="digit_gel" pos="0.024 -0.0085 0.032" euler="0 0 1.5708" contype="0" conaffinity="0" rgba="0.9 0.95 1 0.0"/>

            <!--Elastomer Cover-->
            <geom name="digit_thumb:elastCover" type="mesh" mesh="digit_curve" pos="0.024 -0.0083 0.0305" euler="0 0 1.5708" contype="0" conaffinity="0" material="silver"
                friction="1 0.05 0.01" solimp="1.1 1.2 0.001 0.5 2" solref="0.02 1"/>

            <!--Gel Camera-->
            <camera name="digit_thumb:camera" mode="fixed" pos="0.0129 0.0014 0.008" euler="0 3.14159 1.5708" fovy="30"/>

            <!-- Friction placeholder -->
            <geom name="digit_thumb:friction" type="box" size="0.013 0.013 0.00001" euler="0 0 0" pos="0.012 0.001 0.0362" contype="32" conaffinity="32" rgba="0 0 1 0"
                    friction="1 0.05 0.01" solimp="1.1 1.2 0.001 0.5 2" solref="0.02 1"/>
            </body>

</body>

I have confirmed that no contact occurs and gravity is canceled. Can you explain why this issue arises?

Thank you for your assistance. Plz Let me know if you need more detials

Well, here are my preliminary answers:

A1: ./utils/utils_data/ini_depth_extent.npy is absolute depth value, and we actually use depth difference. https://github.com/Rancho-zhao/FOTS/blob/4bd82cfc75ad8d600b2008df513b28cf04611610/env_test.py#L38-L39

A2: You can try to change the contact depth threshold. https://github.com/Rancho-zhao/FOTS/blob/4bd82cfc75ad8d600b2008df513b28cf04611610/env_test.py#L40 In addition, you can use a list to save marker position data from the last frame to generate marker motion between the current frame and the last frame.

A3: Just as A1 said, we need depth difference (i.e. the contact depth between object and sensor), so the image rendering effect is theoretically not related to the absolute value of depth, you can check if the problem lies here.

Well, here are my preliminary answers:

A1: ./utils/utils_data/ini_depth_extent.npy is absolute depth value, and we actually use depth difference.

https://github.com/Rancho-zhao/FOTS/blob/4bd82cfc75ad8d600b2008df513b28cf04611610/env_test.py#L38-L39

A2: You can try to change the contact depth threshold.

https://github.com/Rancho-zhao/FOTS/blob/4bd82cfc75ad8d600b2008df513b28cf04611610/env_test.py#L40

In addition, you can use a list to save marker position data from the last frame to generate marker motion between the current frame and the last frame.

A3: Just as A1 said, we need depth difference (i.e. the contact depth between object and sensor), so the image rendering effect is theoretically not related to the absolute value of depth, you can check if the problem lies here.

Hi Rancho, Thank you for your immediate reply ! I Have checked depth difference and fixed marker motion generation !

But still I have some problems here

Initial Depth: I have checked the initial depth as Q1 said, they look like the same, maybe the reason that cause the difference is depth permutation. But my leap hand digit initial depth is absolutely wrong. Here are following three depths data:

Official intial depth:
[[0.96567404 0.96568865 0.9657032  ... 0.96569204 0.9656775  0.965663  ]
[0.9656817  0.96569616 0.9657107  ... 0.96569955 0.965685   0.9656705 ]
[0.96568936 0.96570385 0.9657183  ... 0.9657071  0.9656926  0.9656781 ]
...
[0.96549195 0.96550214 0.9655124  ... 0.96550167 0.96549153 0.9654814 ]
[0.9654834  0.9654936  0.9655038  ... 0.9654919  0.96548176 0.9654716 ]
[0.96547484 0.96548504 0.9654952  ... 0.9654821  0.965472   0.9654619 ]]

Initial depth in fots_test.py:
[[0.965674   0.9656886  0.9657032  ... 0.9656919  0.96567744 0.9656629 ]
[0.9656816  0.9656961  0.9657107  ... 0.9656995  0.965685   0.96567047]
[0.9656893  0.9657038  0.96571827 ... 0.96570706 0.9656925  0.96567804]
...
[0.96549195 0.96550214 0.96551234 ... 0.96550167 0.96549153 0.96548134]
[0.96548337 0.9654936  0.9655038  ... 0.9654919  0.96548176 0.9654716 ]
[0.96547484 0.96548504 0.9654952  ... 0.9654822  0.96547204 0.96546185]]

leap digit hand initial depth:
[[-15789.237 -15790.517 -15791.797 ... -15790.493 -15789.221 -15787.947]
[-15789.903 -15791.179 -15792.459 ... -15791.154 -15789.883 -15788.61 ]
[-15790.576 -15791.848 -15793.12  ... -15791.819 -15790.544 -15789.272]
...
[-15773.748 -15774.644 -15775.538 ... -15774.417 -15773.526 -15772.638]
[-15772.997 -15773.893 -15774.788 ... -15773.562 -15772.672 -15771.782]
[-15772.248 -15773.142 -15774.036 ... -15772.706 -15771.816 -15770.927]]

Here is how I get initial depth in my leap scene after I first create mujoco scene(model and data):

def create_ini_depth(self):
    """
    create the initial depth 
    """

    #mujoco.mj_step(self.robot_model, self.robot_data)
    viewer = RenderContextOffscreen(gel_width, gel_height, self.robot_model, self.robot_data)
    viewer.render(gel_width, gel_height, 1) # Thumb 
    depth = viewer.read_pixels(gel_width, gel_height, depth=True)[1].copy()
    depth = np.fliplr(depth)

    # generate tac img for debug
    simulation = get_simapproach()
    tact_img = simulation.generate(depth)

    self.ini_depth = depth.copy()

Even more strange is that after I perform manipulation, the depth rendered by viewer becomes reasonable. Here are my experiment data:

###### step 0 #######
>>>>>>> gel id: 0 <<<<<<<<
current depth: 
 [[0.8399874  0.8400555  0.84012365 ... 0.8400543  0.83998656 0.83991885]
 [0.8400228  0.8400907  0.8401588  ... 0.84008956 0.8400218  0.8399541 ]
 [0.8400586  0.8401263  0.84019405 ... 0.8401248  0.8400571  0.8399893 ]
 ...
 [0.8391634  0.83921105 0.8392587  ... 0.839199   0.8391517  0.83910435]
 [0.8391235  0.8391711  0.83921874 ... 0.8391535  0.8391062  0.8390589 ]
 [0.83908355 0.8391312  0.8391788  ... 0.83910805 0.8390607  0.83901334]] 
 depth shape:    (320, 240)

initial depth: 
 [[-15789.237 -15790.517 -15791.797 ... -15790.493 -15789.221 -15787.947]
 [-15789.903 -15791.179 -15792.459 ... -15791.154 -15789.883 -15788.61 ]
 [-15790.576 -15791.848 -15793.121 ... -15791.819 -15790.544 -15789.272]
 ...
 [-15773.749 -15774.644 -15775.538 ... -15774.417 -15773.526 -15772.638]
 [-15772.998 -15773.893 -15774.788 ... -15773.562 -15772.672 -15771.783]
 [-15772.248 -15773.143 -15774.037 ... -15772.706 -15771.817 -15770.927]] 
 init depth shape:   (320, 240)

here is how I get current depth

def save_tactile_image_headless(self, gel_id, last_depth, num):
        """
        ___ Description___
        This function does fowllowing things:
        1. save tactile marker flow, which is relative to the last frame
        2. Save tactile marker flow which is relative to the initial frame

       ___ Args____
            gel_id (int): Camera id in mujuco scene. 0 for Thumb, 1 for index finger
            last_depth (numpy array): depth data from last frame
            num (int): current frame number, just for saving img.

        ___Return___
            depth: Current depth
        """

        relative_pos = []
        viewer = RenderContextOffscreen(gel_width, gel_height, self.robot_model, self.robot_data)
        viewer.render(gel_width,gel_height,camera_id=gel_id)
        depth = viewer.read_pixels(gel_width, gel_height, depth=True)[1].copy()
        depth = np.fliplr(depth)

        # render depth to tactile img
        simulation = get_simapproach()
        depth_copy = depth.copy()
        tact_img = simulation.generate(depth_copy)

        # obtain object's relative pose
        digit_xpos = mujoco_utils.get_site_xpos(self.robot_model, self.robot_data, f"digit_{gel_id}:site")
        digit_xpos = digit_xpos.reshape(digit_xpos.shape[0],1)
        digit_xmat = mujoco_utils.get_site_xmat(self.robot_model, self.robot_data, f"digit_{gel_id}:site")
        digit_mat = np.vstack((np.hstack((digit_xmat, digit_xpos)),np.array([0,0,0,1])))

        object_xpos = mujoco_utils.get_site_xpos(self.robot_model, self.robot_data, "object1:site")
        object_xpos = object_xpos.reshape(object_xpos.shape[0],1)
        object_xmat = mujoco_utils.get_site_xmat(self.robot_model, self.robot_data, "object1:site")
        object_mat = np.vstack((np.hstack((object_xmat, object_xpos)),np.array([0,0,0,1])))

        relative_mat = np.dot(np.linalg.inv(digit_mat), object_mat)
        relative_xpos = relative_mat[:3,3]
        relative_xmat = relative_mat[:3,:3]
        relative_xrot = cv2.Rodrigues(relative_xmat)[0]

        threshold = 0
        # For relateive depth diff
        relative_depth_diff = last_depth - depth
        relative_mask = relative_depth_diff > threshold

        if np.max(relative_depth_diff)>threshold:
            relative_pos.append([-relative_xpos[1], relative_xpos[0], -relative_xrot[2,0]])
        else:
            relative_pos = []

        print(f">>>>>>> gel id: {gel_id} <<<<<<<<")
        print("current depth: \n", depth, "\n depth shape:\t", depth.shape)
        print("last depth depth: \n", last_depth, "\n last depth shape:\t", last_depth.shape)
        print(f"relative_pos:\n ", relative_pos, "\n relative_depth_diff:\n", relative_depth_diff)
        print("relative mask:\n", relative_mask)

        # # obtain markers' motion according to depth and object geometry info
        marker_relative = MarkerMotion(frame0_blur=tact_img,depth=relative_depth_diff,mask=relative_mask,traj=relative_pos,lamb=[0.00125,0.00021,0.00038])
        marker_img_relative = marker_relative._marker_motion()

        cv2.imwrite(f"./optical/tact_{gel_id}/{num}.png",tact_img)
        cv2.imwrite(f"./marker/tact_{gel_id}_relative/{num}.png", marker_img_relative)

        # For initial depth diff 
        ini_depth = self.ini_depth.copy()

        depth_diff = ini_depth - depth
        mask = depth_diff > threshold

        if np.max(depth_diff)>threshold:
            relative_pos.append([-relative_xpos[1], relative_xpos[0], -relative_xrot[2,0]])
        else:
            relative_pos = []

        print("current depth: \n", depth, "\n depth shape:\t", depth.shape)
        print("initial depth: \n", ini_depth, "\n init depth shape:\t", last_depth.shape)
        print(f"relative_pos:\n ", relative_pos, "\n init depth_diff:\n", depth_diff)
        print("mask:\n", mask)

        # # obtain markers' motion according to depth and object geometry info
        marker = MarkerMotion(frame0_blur=tact_img,depth=depth_diff,mask=mask,traj=relative_pos,lamb=[0.00125,0.00021,0.00038])
        marker_img = marker._marker_motion()

        #cv2.imwrite(f"./optical/tact_{gel_id}/{num}.png",tact_img)
        cv2.imwrite(f"./marker/tact_{gel_id}/{num}.png", marker_img)

        return depth

I'm wondering if you have any clue on this depth rendering ?

Marker position : Would you please offer me a way to get marker position from digit camera's perspective in form of numpy array ?

here is my scene gif, maybe helpful. If you need more details, please let me know.

scene.zip

A1: The depth rendering seems reasonable while performing manipulation, so does the strange depth value only appear in the initial frame?

A2: I don't understand your question a little bit. The initial and current maker positions are already provided in the following code; you can use them according to your needs: https://github.com/Rancho-zhao/FOTS/blob/4bd82cfc75ad8d600b2008df513b28cf04611610/marker_motion.py#L78-L95

A1: The depth rendering seems reasonable while performing manipulation, so does the strange depth value only appear in the initial frame?

A2: I don't understand your question a little bit. The initial and current maker positions are already provided in the following code; you can use them according to your needs:

https://github.com/Rancho-zhao/FOTS/blob/4bd82cfc75ad8d600b2008df513b28cf04611610/marker_motion.py#L78-L95

Dear Rancho-zhao,

I hope this message finds you well. First and foremost, I would like to express my gratitude for your assistance. I must also apologize for my earlier oversight regarding the acquisition of marker positions. After conducting several days of experiments, I find myself needing to discuss the initial depth issue further with you.

Additionally, I have some inquiries concerning MuJoCo integration and version compatibility that I hope you can clarify.

A1: Initial depth

Regarding A1, I’ve observed that the unusual depth readings occur exclusively upon the initial loading and rendering of the scene. However, the depth data appears accurate during subsequent env.step executions. Here is the method I employ to create ini_depth, along with the process I follow for env.step and retrieving depth data:

def create_ini_depth(self):
    """
    Create the initial depth.
    """
    mujoco.mj_step(self.robot_model, self.robot_data)
    viewer = RenderContextOffscreen(gel_width, gel_height, self.robot_model, self.robot_data)
    viewer.render(gel_width, gel_height, 0)  # Thumb
    depth = viewer.read_pixels(gel_width, gel_height, depth=True)[1].copy()
    depth = np.fliplr(depth)
    self.ini_depth = depth.copy()

def step(self, tactile=False, delay=0, num=0, reset_init=False, last_depth=None):
    """Headless version."""
    # ...
    mujoco.mj_step(self.robot_model, self.robot_data)
    # ...
    relative_pos = []
    viewer = RenderContextOffscreen(gel_width, gel_height, self.robot_model, self.robot_data)
    viewer.render(gel_width, gel_height, camera_id=gel_id)
    depth = viewer.read_pixels(gel_width, gel_height, depth=True)[1].copy()
    depth = np.fliplr(depth)
    # ...

To my understanding, there should be no discrepancy between env.step and create_ini_depth. Although I have managed to resolve this issue by utilizing the correct initial depth generated in my env_step, the peculiar phenomenon still piques my curiosity.

Q2: MuJoCo Integration and Marker Motion

Regarding marker motion, my current understanding is as follows: FOTS does not simulate the elastomer’s deformation. Instead, it models the displacement function of marker motion, which results from shear force, normal force, and twist force. The essential inputs for FOTS to simulate marker motion are the indent depth and the contact geometry between the object and the gel. MuJoCo renders the depth image for the indent depth, and the object pose and gel pose are used to determine the contact geometry, as represented by relative_pos in your code.

Q4: MuJoCo Version

I noticed that the latest MuJoCo version is 3.1.6, yet you have chosen to integrate FOTS with version 2.2.2. Is there a specific reason for this preference, such as incompatibility issues with version 3.1.6, or is it due to other considerations? Thank you for your time and attention to these matters. I look forward to your insights.

Rancho-zhao / FOTS

MuJoco integration #7

A1: Initial depth

Q2: MuJoCo Integration and Marker Motion

Q4: MuJoCo Version