facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
MIT License
20.5k stars 2.06k forks source link

How to use fsdp with having some of the layers weight frozen? #260

Closed sakemin closed 12 months ago

sakemin commented 12 months ago

Continuing from #258 When I freeze the weight and bias of output_proj of [ChromaStemConditioner](https://github.com/facebookresearch/audiocraft/blob/main/audiocraft/modules/conditioners.py#L509), and have fsdp.use = true, the error is raised,

ValueError: FlatParameter requires uniform requires_grad

Seems like as default, FSDP needs all the requires_grad values same.

But when constructing FSDP, if use_orig_params=True is passed to the FSDP constructor, then it is possible to have different requires_grad values.

But I found in the original audiocraft/optim/fsdp.py code, use_orig_params=True is already being passed to _FSDPFixStateDict.

Why is it not possible to have only some of the layers' weights frozen, even use_orig_params=True value is passed to the constuctor?

sakemin commented 12 months ago

Oh it seems like I must assign requires_grad after wrapping with FSDP. Thanks!