dxoigmn commented 1 year ago

What does this PR do?

I think it's worth thinking about whether we should merge this PR. Note that AttackInEvalMode will be wrong in newer versions of PL since it puts the model into train mode after on_train_start is called: https://lightning.ai/docs/pytorch/stable/common/lightning_module.html#hooks.

That said, I do not like using eval mode since many modules branch on self.training. I think a better option is to replace batch norm layers with frozen batch norm and remove dropout layers since that is the semantics one actually wants. eval mode is just abused to do that.

Type of change

Please check all relevant options.

[x] Improvement (non-breaking)
[ ] Bug fix (non-breaking)
[ ] New feature (non-breaking)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[ ] This change requires a documentation update

Testing

Please describe the tests that you ran to verify your changes. Consider listing any relevant details of your test configuration.

[ ] Test A
[ ] Test B

Before submitting

[x] The title is self-explanatory and the description concisely explains the PR
[x] My PR does only one thing, instead of bundling different changes together
[x] I list all the breaking changes introduced by this pull request
[x] I have commented my code
[x] I have added tests that prove my fix is effective or that my feature works
[x] New and existing unit tests pass locally with my changes
[x] I have run pre-commit hooks with pre-commit run -a command without errors

Did you have fun?

Make sure you had fun coding 🙃

dxoigmn commented 1 year ago

I should note that I'm not sure this works in multi-gpu mode.

dxoigmn commented 1 year ago

I should note that I'm not sure this works in multi-gpu mode.

This does work but one must beware that BatchNorm modules get turned into SyncBatchNorm when using DDP: https://github.com/IntelLabs/MART/blob/ed89c722f8602885f738cd2765af3d3de97c10af/mart/configs/trainer/ddp.yaml#L8

IntelLabs / MART

Add callback that freezes specified module #141

What does this PR do?

Type of change

Testing

Before submitting

Did you have fun?