berndprach / 1LipschitzLayersCompared

MIT License
7 stars 1 forks source link

Fairness of comparaison #3

Open araujoalexandre opened 10 months ago

araujoalexandre commented 10 months ago

Hi,

I was wondering if you have experimented with different optimizers. The authors of the LOT layer used SGD with Momentum while CPL and SLL use Adam optimizer. When I was working on CPL and SLL with have tried SGD but discovered that the performance was very bad compared to training with Adam.

I haven't read your code too carefully but it seems that you are only using SGD which might not be a fair comparison between LOT (and maybe others) with CPL and SLL.

https://github.com/berndprach/1LipschitzLayersCompared/blob/main/train_model.py#L26

Thank you.

berndprach commented 10 months ago

Hi, it is true that different methods can perform better with certain optimizer as well as different architectures, loss functions or network depths. For the results with all those choices optimized over, we expect the original papers to be good resources.

For this paper, we wanted to compare methods in a general setting that is equal for all methods.

However, we are definitely also interested in the influence of all the choices mentioned above, we keep the option open to explore this in future work.

araujoalexandre commented 10 months ago

I have noticed that you have experimented with different learning rate, different weight decay. Would it be possible to extend the comparison with Adam as well? I think methods which have orthogonality constraint may have an advantage with SGD than other approaches with no orthogonality constraint. My intuition is that approaches with no orthogonality constraint would perform better with Adam than with SGD.

berndprach commented 10 months ago

I think for the next few weeks we are occupied with different projects, but we are definitely considering extending this comparison in the future. Considering the influence of the optimizer is definitely one possibility.