Add support for multiple param groups

AlbinSou commented 4 months ago

Hey, I just put a prototype for now of the kind of changes I want to bring.

First of all, quick reminder of why it's not as easy as it seems to support multiple parameter groups:

1) In continual learning, it is common to add parameters to the model upon encountering new data, but if the user adds parameters, it is not easy to know in what parameter group this new parameter should be assigned to, because the parameter group information is contained inside the optimizer whereas the added parameters are added to the model without additional info on what parameter group they should be in.
2) Now, although the default way to make changes in the model parameters in avalanche is to use DynamicModules, it is technically possible to manually add parameters to nn.Modules, which makes tracking of new parameters even harder.
3) To complexify the things a bit, pytorch optimizer like to store the parameters in a very unconvenient datastructure which is a list of parameters, not linking directly each parameter to it's name

How I plan to remedy to that:

Ideally, we want to have native support for a normal use of multiple parameter groups. If the user puts different parameter groups into the optimizer, he should not have to worry about them being broken. Of course, right now it's not the case and what we do is just to reset the optimizer at the start of training to setup the name, param_id mapping later used for update, and we put an assertion saying that if the optimizer has more than one param group it fails and it requires the user to implement it's own make_optimizer function (which is a pity because then using an existing method with different parameter groups requires to copy the strategy or create a new plugin).
Since users can add parameters both through DynamicModules and Manually, it's better to use a general method that works with any nn.Module object
Provide an automatic and natural way based on the natural module hierarchy to assign parameter groups to new parameters. Leaving more complex logic (which should not happen so often ?) to custom implementations.

The idea is the following, we let the user define it's optimizer with it's own param groups. Then we go over the model named_parameters() and store each name into a tree structure (which is a copy of the torch module tree structure). Each node (not necessarily parameter, a parameter is a leaf node) will contain information about a set of parameter groups that it's children nodes belong to. Then, when a new node is added (new parameter), it's group is inferred based on the group of it's parent node. If his parent node has multiple groups, it means that this node's parameter group cannot be inferred and returns an error asking the user to either change the module creation process or implement his own optimizer update logic.

@AntonioCarta I just provide for now a playground in test/test_optimizer.py that you can run as-is to see how it will work, I will implement the actual thing later. Let me know if you agree on the idea first

coveralls commented 4 months ago

Pull Request Test Coverage Report for Build 8426226675

Details

169 of 381 (44.36%) changed or added relevant lines in 3 files are covered.
1 unchanged line in 1 file lost coverage.
Overall coverage decreased (-0.06%) to 51.751%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
avalanche/models/dynamic_optimizers.py	119	135	88.15%
tests/test_optimizer.py	48	244	19.67%
<!--	Total:	169	381	44.36%	-->

Files with Coverage Reduction	New Missed Lines	%
avalanche/models/dynamic_optimizers.py	1	88.46%
<!--	Total:	1		-->

Totals
Change from base Build 8098020118:	-0.06%
Covered Lines:	14863
Relevant Lines:	28720

💛 - Coveralls

AlbinSou commented 3 months ago

Commenting about this issue here so that I can keep track of it:

Right now the update_optimizer utils works like the following: it takes a dictionary of {name: param} mapping of previously optimized parameters, as well as one for new parameters. It works fine if we assume that one same parameter is not going to change name, but this issue can happen, for instance in the case a model is wrapped inside another module while keeping the same parameters.

In the case that parameters change name, there is an issue in the current update_optimizer is that it considers parameters that have changed their name as new parameters and resets their state.

The OptimizedParameterStructure works well for that because it identifies parameters inside the current optimizer and matches them to a name that correspond to the same parameter id.

Maybe I could abandon the use of optimized_param_id dict and fully switch to only using OptimizedParameterStructure, it requires some more work but I think it could work way better and also remove the requirement for storing the previously optimized params in the strategy as it's currently done

AlbinSou commented 3 months ago

@AntonioCarta I added a test in which I rename plus add one parameter. This test is the typical case where it should not work. But I noticed an additional problem. When some parameter is both renamed and expanded, it can absorb the parameter group of the other parameters without any notice (because in that case the expanded parameter is considered as a new parameter). This is quite an edge case but that could be problematic, because I really don't know how to even notice with a warning this kind of event happens when both the name and parameter id change there is no way to tell this happens.

AntonioCarta commented 3 months ago

When some parameter is both renamed and expanded

Usually expanded parameters are not renamed, right?

I don't think we can really fix this (there is no correct behavior here), but maybe we can improve debugging this issues. For example, can we add a verbose mode that prints the parameters which have been adapted/expanded/renamed... ? Would this make it easier to find potential bugs?

AlbinSou commented 3 months ago

When some parameter is both renamed and expanded

Usually expanded parameters are not renamed, right?

I don't think we can really fix this (there is no correct behavior here), but maybe we can improve debugging this issues. For example, can we add a verbose mode that prints the parameters which have been adapted/expanded/renamed... ? Would this make it easier to find potential bugs?

No, usually they are not renamed, but if you rename them at the same temporality than they are expanded, it can be a problem. It should be fixed however if you call strategy.make_optimizer() manually after renaming and before the parameter expansion

AlbinSou commented 3 months ago

FYI, the example is failing right now:
ValueError: This function only supports single parameter groups.If you need to use multiple parameter groups, you can override `make_optimizer` in the Avalanche strategy.
Maybe you didn't change the templates to not use the old reset_optimizer?

I tried it and it works inside this pull request

AntonioCarta commented 3 months ago

FYI, the example is failing right now:
ValueError: This function only supports single parameter groups.If you need to use multiple parameter groups, you can override `make_optimizer` in the Avalanche strategy.
Maybe you didn't change the templates to not use the old reset_optimizer?
I tried it and it works inside this pull request

yes, it was an issue in my env. Now it's working.

AlbinSou commented 3 months ago

@AntonioCarta I think it's ready to merge. I made a few checks on the continual learning baselines to check that I obtain the same results with old and new implementation. I also added a test where the user adds manually some parameters to the parameter group he wants and it works fine.

AntonioCarta commented 3 months ago

Thank you! Everything looks good now.

ContinualAI / avalanche