gafniguy / 4D-Facial-Avatars

Dynamic Neural Radiance Fields for Monocular 4D Facial Avater Reconstruction
681 stars 67 forks source link

Qualitative results different from paper: Blurry shoulders #54

Open cuevhv opened 1 year ago

cuevhv commented 1 year ago

Hi,

Thank you for your work and code. The results on the paper look amazing. I used person_2 from your dataset and the config file attached at the end, and trained the network for 400k following your paper. The face looks good, but the shoulders are way too blurry compared to the image in the paper (Fig 5, second row). Could you let me know if there are different parameters I should use or if there are some parts of the code I should modify? theirs_vs_mine

Thanks for the help!

# Parameters to setup experiment.
experiment:
  # Unique experiment identifier
  id: dave__fixed_bg_512_paper_model
  # Experiment logs will be stored at "logdir"/"id"
  logdir: logs
  # Seed for random number generators (for repeatability).
  randomseed: 42  # Cause, why not?
  # Number of training iterations.
  train_iters: 1000000
  # Number of training iterations after which to validate.
  validate_every: 1000
  # Number of training iterations after which to checkpoint.
  save_every: 5000
  # Number of training iterations after which to print progress.
  print_every: 100
  device: 0

# Dataset parameters.
dataset:
  # Type of dataset (Blender vs LLFF vs DeepVoxels vs something else)
  type: blender
  # Base directory of dataset.
  basedir: /datasets/nerface/person_2
  #basedir: real_data/andrei_1_light
  #basedir: real_data/debug
  # Optionally, provide a path to the pre-cached dataset dir. This
  # overrides the other dataset options.
  #cachedir: cache/flame_sample
  # For the Blender datasets (synthetic), optionally return images
  # at half the original resolution of 800 x 800, to save space.
  half_res: False
  # Stride (include one per "testskip" images in the dataset).
  testskip: 1
  # Do not use NDC (normalized device coordinates). Usually True for
  # synthetic (Blender) datasets.
  no_ndc: True
  # Near clip plane (clip all depth values closer than this threshold).
  near: 0.2
  # Far clip plane (clip all depth values farther than this threshold).
  far: 0.8

# Model parameters.
models:
  # Coarse model.
  coarse:
    # Name of the torch.nn.Module class that implements the model.
    type: ConditionalBlendshapePaperNeRFModel
    # Number of layers in the model.
    num_layers: 4 # ignore this, I hard coded the model
    # Number of hidden units in each layer of the MLP (multi-layer
    # perceptron).
    hidden_size: 256
    # Add a skip connection once in a while. Note: This parameter
    # won't take affect unless num_layers > skip_connect_every.
    skip_connect_every: 3
    # Whether to include the position (xyz) itself in its positional
    # encoding.
    include_input_xyz: True
    # Whether or not to perform log sampling in the positional encoding
    # of the coordinates.
    log_sampling_xyz: True
    # Number of encoding functions to use in the positional encoding
    # of the coordinates.
    num_encoding_fn_xyz: 10
    # Additionally use viewing directions as input.
    use_viewdirs: True
    # Whether to include the direction itself in its positional encoding.
    include_input_dir: False
    # Number of encoding functions to use in the positional encoding
    # of the direction.
    num_encoding_fn_dir: 4
    # Whether or not to perform log sampling in the positional encoding
    # of the direction.
    log_sampling_dir: True
  # Fine model.
  fine:
    # Name of the torch.nn.Module class that implements the model.
    type: ConditionalBlendshapePaperNeRFModel
    # Number of layers in the model.
    num_layers: 4 # ignore this, I hard coded the model
    # Number of hidden units in each layer of the MLP (multi-layer
    # perceptron).
    hidden_size: 256
    # Add a skip connection once in a while. Note: This parameter
    # won't take affect unless num_layers > skip_connect_every.
    skip_connect_every: 3
    # Number of encoding functions to use in the positional encoding
    # of the coordinates.
    num_encoding_fn_xyz: 10
    # Whether to include the position (xyz) itself in its positional
    # encoding.
    include_input_xyz: True
    # Whether or not to perform log sampling in the positional encoding
    # of the coordinates.
    log_sampling_xyz: True
    # Additionally use viewing directions as input.
    use_viewdirs: True
    # Whether to include the direction itself in its positional encoding.
    include_input_dir: False
    # Number of encoding functions to use in the positional encoding of
    # the direction.
    num_encoding_fn_dir: 4
    # Whether or not to perform log sampling in the positional encoding
    # of the direction.
    log_sampling_dir: True

# Optimizer params.
optimizer:
  # Name of the torch.optim class used for optimization.
  type: Adam
  # Learning rate.
  lr: 5.0E-4

# Learning rate schedule.
scheduler:
  # Exponentially decay learning rate (in 1000 steps)
  lr_decay: 250
  # Rate at which to apply this decay.
  lr_decay_factor: 0.1

# NeRF parameters.
nerf:
  # Use viewing directions as input, in addition to the X, Y, Z coordinates.
  use_viewdirs: True
  # Encoding function for position (X, Y, Z).
  encode_position_fn: positional_encoding
  # Encoding function for ray direction (theta, phi).
  encode_direction_fn: positional_encoding
  # Training-specific parameters.
  train:
    # Number of random rays to retain from each image.
    # These sampled rays are used for training, and the others are discarded.
    num_random_rays: 2048  # 32 * 32 * 4 # was 1024
    # Size of each chunk (rays are batched into "chunks" and passed through
    # Size of each chunk (rays are batched into "chunks" and passed through
    # the network)
    chunksize: 2048 #16384  #131072  # 131072  # 1024 * 32
    # Whether or not to perturb the sampled depth values.
    perturb: True
    # Number of depth samples per ray for the coarse network.
    num_coarse: 64
    # Number of depth samples per ray for the fine network.
    num_fine: 64
    # Whether to render models using a white background.
    white_background: False
    # Standard deviation of noise to be added to the radiance field when
    # performing volume rendering.
    radiance_field_noise_std: 0.1
    # Sample linearly in disparity space, as opposed to in depth space.
    lindisp: False
  # Validation-specific parameters.
  validation:
    # Number of random rays to retain from each image.
    # These sampled rays are used for training, and the others are discarded.
    chunksize: 65536 #4096  #131072   # 1024 * 32
    # Whether or not to perturb the sampled depth values.
    perturb: True
    # Number of depth samples per ray for the coarse network.
    num_coarse: 64
    # Number of depth samples per ray for the fine network.
    num_fine: 64
    # Whether to render models using a white background.
    white_background: False
    # Standard deviation of noise to be added to the radiance field when
    # performing volume rendering.
    radiance_field_noise_std: 0.
    # Sample linearly in disparity space, as opposed to in depth space.
    lindisp: False
WindAndWood commented 1 year ago

I don't think it's a problem with the config file. The NeRF model used in this github code is inconsistent with the NeRF mentioned in NeRFace paper. The neural network structure mentioned in NeRFace paper is consistent with the vanilla NeRF(8 fully-connected ReLU layers). But only 6 fully connected layers are used in the code. I don’t know what the purpose of the author. You can refer here for more details. vanilla NeRF.txt NeRFace github NeRF.txt