Evaluation of surface normal in THuman Dataset

tijiang13 commented 1 month ago

Hello,

Thanks for the great work! Could you share a bit of more details on how the surface normals evaluated in THuman 2.0? I read your reply in other issues and it looks that the cosine similarity between normals in camera coordinate system is measured.

But how were the cameras sampled? I guess the distance between the camera and object can play a huge factor here (c.f. weak perspective camera) especially as Sapiens has no prior knowledge over the camera's intrinsics during the inference?

Best, Tianjian

rawalkhirodkar commented 1 month ago

Hello @tijiang13 , Please refer to Table. 6 of the paper for the normal evaluation metrics. We compute angular error and % within t deg.

Cameras were randomly sampled between a fixed distance range - same set of images were used to evaluate all the methods. A perspective camera was used to generate the images from the THuman2.0 dataset.

tijiang13 commented 1 month ago

Thanks, but I am wondering what the exact hyper parameter e.g. for the distribution of camera-object distance/focal length -- I might also need to evaluate my own method under a similar setting.

Best, Tianjian

tijiang13 commented 1 month ago

Hi @rawalkhirodkar,

Just to add some additional context -- here is my current setting:

@dataclass
class RandomCameraConfig:
    """configuration for random camera pose generation
    note:
    1. azimuth and elevation are in degrees, camera_distance is in meters
    2. the object is centered at the origin
    3. the camera is looking at the origin
    """

    # image resolution
    height: int = 512
    width: int = 512

    # camera parameters (in degrees)
    azimuth_range: tuple[float, float] = (-180, 180)
    elevation_range: tuple[float, float] = (-10, 10)
    camera_distance_range: tuple[float, float] = (2.0, 3.0)

    # field of view
    fov_range: tuple[float, float] = (30, 60)

rawalkhirodkar commented 1 month ago

This is the code we use to place a virtual camera.

# Set render resolution for 4:3 aspect ratio
    bpy.context.scene.render.resolution_x = 1440
    bpy.context.scene.render.resolution_y = 1920

def setup_random_camera(mesh_obj, mesh_dimensions, camera_mode):
    # Create a new camera
    cam_data = bpy.data.cameras.new('Camera')
    cam_ob = bpy.data.objects.new('Camera', cam_data)
    bpy.context.scene.collection.objects.link(cam_ob)
    bpy.context.scene.camera = cam_ob  # Set the created camera to be the active one

    print('-----------mode:{}--------'.format(camera_mode))
    # Calculate different parts of the human body
    body_parts = {
        'full_body': mesh_obj.location + Vector((0.0, 0.0, mesh_dimensions.z * 0.5)),
        'face': mesh_obj.location + Vector((0.0, 0.0, mesh_dimensions.z * 0.85)),
        'upper_half': mesh_obj.location + Vector((0.0, 0.0, mesh_dimensions.z * 0.75)),
    }

    # Determine the target position based on selected mode
    target_position = body_parts[camera_mode]

    # Sample the focal length appropriately
    focal_lengths = {
        'full_body': random.uniform(28, 50),
        'face': random.uniform(85, 135),
        'upper_half': random.uniform(50, 85),
    }
    cam_data.lens = focal_lengths[camera_mode]

    # Define the distance of the camera from the target based on the mode
    distances = {
        'full_body': random.uniform(1.2, 2.0),
        'face': random.uniform(1, 1.2),
        'upper_half': random.uniform(1, 1.6),
    }
    distance = distances[camera_mode] + random.uniform(-0.1, 0.1)  # Add noise to the distance

    angle = random.uniform(0, 2 * math.pi)
    height_angle = random.uniform(-math.pi / 6, math.pi / 6)  # Introduce variability in height angle

    # Calculate the camera location on a concentric circle around the target
    camera_location = target_position + Vector((distance * math.cos(angle) * math.cos(height_angle), 
                                                distance * math.sin(angle) * math.cos(height_angle), 
                                                distance * math.sin(height_angle)))

    # Adjust camera height based on mode
    height_offsets = {
        'full_body': mesh_dimensions.z * 0.1,  
        'face': mesh_dimensions.z * 0.65,
        'upper_half': mesh_dimensions.z * 0.45,
    }
    camera_location.z += height_offsets[camera_mode] + random.uniform(-0.05, 0.05) * mesh_dimensions.z  # Add noise

    # Set camera location
    cam_ob.location = camera_location

    # Point camera to the target position
    direction = target_position - camera_location
    rot_quat = direction.to_track_quat('-Z', 'Y')
    cam_ob.rotation_euler = rot_quat.to_euler()

    # Add noise to the camera rotation
    rotation_noise_angles = (0, 0, 0)
    rotation_noise_euler = Euler(rotation_noise_angles, 'XYZ')  # Create an Euler rotation from the angles

    # Apply the rotation noise to the camera's current rotation
    cam_ob.rotation_euler.rotate(rotation_noise_euler)  # Apply the rotation noise directly, no assignment needed

    return cam_ob.location, cam_ob.rotation_euler

##---------------------------------------------------------------------
def setup_camera(mesh_obj, camera_mode):
    mesh_center, mesh_dimensions = get_mesh_center_and_dimensions(mesh_obj)
    loc, rot = setup_random_camera(mesh_obj, mesh_dimensions, camera_mode)

    return loc, rot

tijiang13 commented 1 month ago

Thanks a lot! -- Best, Tianjian

facebookresearch / sapiens

Evaluation of surface normal in THuman Dataset #57