Open ArtiKitten opened 1 year ago
From what I understood, the batch size is what's important in the reconstruction of this kind of scene.
Maybe that's not exactly the point; the recommendations made are only related to the hyperparameters dict_size
and dim
of the hashgrid
.
Your configuration could be the one shown in projects/neuralangelo/configs/tnt.yaml
.
_parent_: projects/neuralangelo/configs/base.yaml
model:
object:
sdf:
mlp:
inside_out: False # True for Meetingroom.
encoding:
coarse2fine:
init_active_level: 8
appear_embed:
enabled: True
dim: 8
data:
type: projects.neuralangelo.data
root: datasets/tanks_and_temples/Barn
num_images: 410 # The number of training images.
train:
image_size: [835,1500]
batch_size: 1
subset:
val:
image_size: [300,540]
batch_size: 1
subset: 1
max_viz_samples: 16
In your case, setting inside_out = True
may also be helpful.
Take a look at this document for experimental details: Supplementary
The high batch size is what they mentionned using in the supplementary paper on the project, so 16 for T&T. I trained meeting room on a 2x3090 setup during the whole weekend, for a total of 70h (250 000 iterations).
dict_size =22
dim=8
batch_size=2
Since I wasn't working during the weekend, I didn't notice the model stopped imporving at around 100k iterations.
wandb loss graphes
50k iterations
70k iterations
100k iterations
250k iterations
As we can see, no difference after 100k.
Actually, I found what you mentioned in A. Additional Hyper-parameter section.
For the DTU benchmark, we follow prior work [14–16] and use a batch size of 1. For the Tanks and Temples dataset, we use a batch size of 16. We use the marching cubes algorithm [5] to convert predicted SDF to triangular meshes. The marching cubes resolution is set to 512 for the DTU benchmark following prior work [1, 14–16] and 2048 for the Tanks and Temples dataset
There are differences after 100k iterations, but perhaps not so representative.
If I were in your position, I would choose to merge the configuration you've already used but would also incorporate those adjustments I had mentioned earlier regarding the Signed Distance Function (SDF).
Please keep me updated of your results.
Best regards, Lucas.
Hi, I'm currently working on reconstructing large indoor environments. From what I understood, the batch size is what's important in the reconstruction of this kind of scene.
I'm testing the setup with the meeting room from Tanks and Temples with a down sample of 30 (~370 images).
My first try is on a Quadro8000, so 48GB of memory, and with the recommended config (dict_size=22, dim=8, batch_size=16), it won't even start training. The only way the training doesn't fail is by running it with a batch_size of 4 and it does 1.25 it/s. Achieving 500 000 iterations would take quite literally almost a week of training. It then crashed at iteration 10 000, the first checkpoint.
And for the configuration file,
That makes me wonder how we can achieve the same result as you show in the paper with large indoor environment? Do you have any example of a config file to achieve this and the corresponding GPUs?
Maybe I'm missing something and I don't understand what I'm doing? Maybe I just need $150k worth of GPUs?
Thanks for your help!
EDIT: I had the chance to test 2x3090 gpus and I still can't train with 16 or even 8 as batch_size.