NVlabs / nvdiffrec

Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".
Other
2.13k stars 223 forks source link

about making masks of custom dataset #58

Closed sadexcavator closed 2 years ago

sadexcavator commented 2 years ago

Hi, sorry to ask about this again, I know #3 and some other issues are about this problem, but it seems that there only are ways of getting poses for the images are discussed, and same for the 'https://github.com/bmild/nerf#generating-poses-for-your-own-scenes' you mentioned in another issue before, those methods truly help a lot, thanks for you and those who contributed! Now I am confused with the masks getting process, i find there are two different types of masks in your examples, in nerd dataset there's a 'masks' directory containing mask images corresponding with the ones in 'images', and for the nerf synthetic dataset, the 4-th channel image contains the masks information in the image itself as you said.

So I guess my biggest question is that how to get the mask images like the first way or how to get 32bit images with alpha mask. Besides, i want to know if this is true, that seperate mask images need to be used with .npy file, and alpha-mask type needs to be used with .json file. Please correct me if i get something wrong about all these. Thanks for your work and your time.

sadexcavator commented 2 years ago

hi,now i know how to make mask,but I still have a question about this-- that seperate mask images need to be used with .npy file, and alpha-mask type needs to be used with .json file? AND WHAT does rotation and camera angle x mean in the json file,which has also been asked in the #57 THANKS!

sadexcavator commented 2 years ago

And 1 more thing, is it always needed to use your rescale_images.py to rescale the img size to 512,512 if I want to use the LLFF format.

JHnvidia commented 2 years ago

Hi,

That is correct. There are no strict standards for the datasets, but datasets based on the LLFF codebase (the ones with a .npy file) typically uses separate mask images. In contrast, NeRF elected to put the mask in the alpha channel.

The camera_angle_x parameter is vertical camera field of view, see angle_x in the blender documentation: https://docs.blender.org/api/current/bpy.types.Camera.html. We do not use the rotation parameter, so it can be safely ignored.

The NeRF datasets aren't documented, but their blender scenes contain the scripts used to generate images and json files. You can download them from their google drive data share, I'm pasting their script for chair.blend below.

# Also produces depth map at the same time.

import argparse, sys, os
import json
import bpy
import mathutils
import numpy as np

DEBUG = False

VIEWS = 200
RESOLUTION = 800
RESULTS_PATH = 'results_test_200'
DEPTH_SCALE = 1.4
COLOR_DEPTH = 8
FORMAT = 'PNG'
UPPER_VIEWS = True
CIRCLE_FIXED_START = (0,0,0)
CIRCLE_FIXED_END = (.7,0,0)

fp = bpy.path.abspath(f"//{RESULTS_PATH}")

def listify_matrix(matrix):
    matrix_list = []
    for row in matrix:
        matrix_list.append(list(row))
    return matrix_list

if not os.path.exists(fp):
    os.makedirs(fp)

# Data to store in JSON file
out_data = {
    'camera_angle_x': bpy.data.objects['Camera'].data.angle_x,
}

# Render Optimizations
bpy.context.scene.render.use_persistent_data = True

# Set up rendering of depth map.
bpy.context.scene.use_nodes = True
tree = bpy.context.scene.node_tree
links = tree.links

# Add passes for additionally dumping albedo and normals.
bpy.context.scene.view_layers["RenderLayer"].use_pass_normal = True
bpy.context.scene.render.image_settings.file_format = str(FORMAT)
bpy.context.scene.render.image_settings.color_depth = str(COLOR_DEPTH)

if 'Custom Outputs' not in tree.nodes:
    # Create input render layer node.
    render_layers = tree.nodes.new('CompositorNodeRLayers')
    render_layers.label = 'Custom Outputs'
    render_layers.name = 'Custom Outputs'

    depth_file_output = tree.nodes.new(type="CompositorNodeOutputFile")
    depth_file_output.label = 'Depth Output'
    depth_file_output.name = 'Depth Output'
    if FORMAT == 'OPEN_EXR':
      links.new(render_layers.outputs['Depth'], depth_file_output.inputs[0])
    else:
      # Remap as other types can not represent the full range of depth.
      map = tree.nodes.new(type="CompositorNodeMapRange")
      # Size is chosen kind of arbitrarily, try out until you're satisfied with resulting depth map.
      map.inputs['From Min'].default_value = 0
      map.inputs['From Max'].default_value = 8
      map.inputs['To Min'].default_value = 1
      map.inputs['To Max'].default_value = 0
      links.new(render_layers.outputs['Depth'], map.inputs[0])

      links.new(map.outputs[0], depth_file_output.inputs[0])

    normal_file_output = tree.nodes.new(type="CompositorNodeOutputFile")
    normal_file_output.label = 'Normal Output'
    normal_file_output.name = 'Normal Output'
    links.new(render_layers.outputs['Normal'], normal_file_output.inputs[0])

# Background
bpy.context.scene.render.dither_intensity = 0.0
bpy.context.scene.render.film_transparent = True

# Create collection for objects not to render with background

objs = [ob for ob in bpy.context.scene.objects if ob.type in ('EMPTY') and 'Empty' in ob.name]
bpy.ops.object.delete({"selected_objects": objs})

def parent_obj_to_camera(b_camera):
    origin = (0, 0, 0)
    b_empty = bpy.data.objects.new("Empty", None)
    b_empty.location = origin
    b_camera.parent = b_empty  # setup parenting

    scn = bpy.context.scene
    scn.collection.objects.link(b_empty)
    bpy.context.view_layer.objects.active = b_empty
    # scn.objects.active = b_empty
    return b_empty

scene = bpy.context.scene
scene.render.resolution_x = RESOLUTION
scene.render.resolution_y = RESOLUTION
scene.render.resolution_percentage = 100

cam = scene.objects['Camera']
cam.location = (0, 4.0, 0.5)
cam_constraint = cam.constraints.new(type='TRACK_TO')
cam_constraint.track_axis = 'TRACK_NEGATIVE_Z'
cam_constraint.up_axis = 'UP_Y'
b_empty = parent_obj_to_camera(cam)
cam_constraint.target = b_empty

scene.render.image_settings.file_format = 'PNG'  # set output format to .png

from math import radians

stepsize = 360.0 / VIEWS
vertical_diff = CIRCLE_FIXED_END[0] - CIRCLE_FIXED_START[0]
rotation_mode = 'XYZ'

if not DEBUG:
    for output_node in [tree.nodes['Depth Output'], tree.nodes['Normal Output']]:
        output_node.base_path = ''

out_data['frames'] = []

b_empty.rotation_euler = CIRCLE_FIXED_START[0] + vertical_diff

for i in range(0, VIEWS):
    if DEBUG:
        i = np.random.randint(0,VIEWS)
        b_empty.rotation_euler[0] = CIRCLE_FIXED_START[0] + (np.cos(radians(stepsize*i))+1)/2 * vertical_diff
        b_empty.rotation_euler[2] += radians(2*stepsize*i)

    print("Rotation {}, {}".format((stepsize * i), radians(stepsize * i)))
    scene.render.filepath = fp + '/r_' + str(i)

    tree.nodes['Depth Output'].file_slots[0].path = scene.render.filepath + "_depth_"
    tree.nodes['Normal Output'].file_slots[0].path = scene.render.filepath + "_normal_"

    if DEBUG:
        break
    else:
        bpy.ops.render.render(write_still=True)  # render still

    frame_data = {
        'file_path': scene.render.filepath,
        'rotation': radians(stepsize),
        'transform_matrix': listify_matrix(cam.matrix_world)
    }
    out_data['frames'].append(frame_data)

    b_empty.rotation_euler[0] = CIRCLE_FIXED_START[0] + (np.cos(radians(stepsize*i))+1)/2 * vertical_diff
    b_empty.rotation_euler[2] += radians(2*stepsize)

if not DEBUG:
    with open(fp + '/' + 'transforms.json', 'w') as out_file:
        json.dump(out_data, out_file, indent=4)
sadexcavator commented 2 years ago

thanks a lot!

realNb commented 2 years ago

hi,now i know how to make mask,but I still have a question about this-- that seperate mask images need to be used with .npy file, and alpha-mask type needs to be used with .json file? AND WHAT does rotation and camera angle x mean in the json file,which has also been asked in the #57 THANKS!

Hi, how do you generate masks? Do you find the mesh quality sensitive to the mask quality?

sadexcavator commented 2 years ago

I use rembg to do this, and I think that is true.