NVIDIAGameWorks / kaolin-wisp

NVIDIA Kaolin Wisp is a PyTorch library powered by NVIDIA Kaolin Core to work with neural fields (including NeRFs, NGLOD, instant-ngp and VQAD).
Other
1.45k stars 133 forks source link

How to find the VQAD model size correctly #57

Closed cv-dote closed 1 year ago

cv-dote commented 2 years ago

Thanks for this great work! I am trying to reproduce the VQAD paper results.
I train the VQAD model using the RTMV datasets with default config.
But the model size is over 20MB, not small as the paper.
Could you please show me how to calculate the correct model size?

training config

global:
  exp_name: "test-vqad-nerf"

optimizer:
  optimizer_type: "rmsprop"
  lr: 0.001

dataset:
  dataset_type: "multiview"
  num_rays_sampled_per_img: 4096
  multiview_dataset_format: "rtmv"
  mip: 2
  bg_color: "white"

renderer:
  tracer_type: "PackedRFTracer"
  num_steps: 16
  render_batch: 4000
  camera_origin:
    - -3.0
    - 0.65
    - -3.0
  shading_mode: "rb"
  render_res:
    - 1024
    - 1024

trainer:
  trainer_type: "MultiviewTrainer"
  epochs: 50
  batch_size: 1
  model_format: "full"
  valid_every: 50
  save_every: 50
  render_every: 50

grid:
  grid_type: "CodebookOctreeGrid"
  interpolation_type: "linear"
  multiscale_type: "sum"
  feature_dim: 5
  feature_std: 0.01
  base_lod: 5
  num_lods: 4
  codebook_bitwidth: 4

net:
  nef_type: "NeuralRadianceField"
  hidden_dim: 128
  num_layers: 1
  out_dim: 4

embedder:
  embedder_type: "positional"

the script I used to calculate the model size

import torch

model = torch.load("path/to/model")
param_size = 0

for param in model.parameters():
    param_size += param.nelement() * param.element_size()
buffer_size = 0
for buffer in model.buffers():
    buffer_size += buffer.nelement() * buffer.element_size()

size_all_mb = (param_size + buffer_size) / 1024**2
print('model size: {:.3f}MB'.format(size_all_mb))

Thanks in advance!

tovacinni commented 1 year ago

Hi thanks for your interest in our work!

The saved parameter size is not exactly the same as what is reported in the paper because the numbers reported in the paper are quantized. That is, the feature grid stores confidence vectors which represent which index should be activated in training time, which in inference time can be converted into integers through torch.argmax(). We don't currently have the code to do that conversion for your automatically but we can implement something like that.

cv-dote commented 1 year ago

Thank you so much for that information!
Closing.