Which hyper-parameters to use for reproducing the result?

Greetings, and many thanks for the contribution as well as for sharing the implementation!

I'm trying to make sense of which hyper-parameters have been used for the different settings, and hope you may be able to help me out despite it being a while ago that you ran the experiments. I totally understand if you cannot recall all the details. Any pointers would be greatly appreciated.

In the paper, there is a subsection of Section 4 which mentions experimenting with different hyper-parameters:

Learning rates in {1e−2, 1e−3, 1e−4}
Network width for the encoder E and heads in {128, 256, 512}
Number of layers in these networks {2, 3}
Depth threshold h in {1e-2, 1e-3, 1e-4}
Std normalization for the layer output in Eq. (1)

So here comes some questions :grin:

Are the .conf files on the GitHub repo supposed to be the result of this hyper-parameter optimization? (explaining e.g. why Learning_Euc.conf declares learning rate 1e-3, as opposed to 1e-4 in the other .conf files)
Does 5. above perhaps refer to whether or not you normalize the features by dividing with their standard deviation (in addition to subtracting the mean), i.e. uncommenting the line right here? https://github.com/drormoran/Equivariant-SFM/blob/26658a7452e8de1458a8d9d969a334f965060126/code/models/layers.py#L87
In the Architecture subsection, it is stated that 512 features were used during learning, and 256 features during optimization. Was this indeed the case, or did you, as a result of the hyper-parameter search, perhaps use 256 features also in the learning setting (this appears to be the case in the .conf files)?

https://github.com/drormoran/Equivariant-SFM/blob/26658a7452e8de1458a8d9d969a334f965060126/code/confs/Learning_Euc.conf#L22 https://github.com/drormoran/Equivariant-SFM/blob/26658a7452e8de1458a8d9d969a334f965060126/code/confs/Learning_Proj.conf#L19

In addition to using a higher learning rate, Learning_Euc.conf file has a few other hyper-parameters which differ from the other .conf files. Can you recall whether this was an intentional difference? And in that case, whether it was important for achieving the final result?
- dataset.batch_size=4 instead of 1
- dataset.max_sample_size=40 instead of 20 (the paper states 20 as the upper limit)
- train.num_of_epochs=30000 instead of 100000
- train.scheduler_milestone=[20000] instead of [50000, 70000, 90000]
- train.gamma=0.5 instead of 0.1
- train.eval_intervals=250 instead of 5000
- (This interval will actually affect how many candidate models there are to choose from when picking the model which performs best on the validation scenes, for further fine-tuning on the test scenes.)

Best, Lucas

drormoran / Equivariant-SFM

Which hyper-parameters to use for reproducing the result? #7