adapter-hub / adapters

A Unified Library for Parameter-Efficient and Modular Transfer Learning
https://docs.adapterhub.ml
Apache License 2.0
2.53k stars 338 forks source link

AdapterConfig's leave_out not work well in EncoderDecoderModel #472

Open ZeguanXiao opened 1 year ago

ZeguanXiao commented 1 year ago

Environment info

Information

Model I am using (Bert, XLNet ...): EncoderDecoderModel

Language I am using the model on (English, Chinese ...): English

Adapter setup I am using (if any): AdapterConfig

The problem arises when using:

The tasks I am working on is:

To reproduce

from transformers import EncoderDecoderModel, AdapterConfig
model = EncoderDecoderModel.from_encoder_decoder_pretrained("bert-base-uncased", "bert-base-uncased")

When not leaving out layers, it's okay.

### no leave_out
adapter_config = AdapterConfig(mh_adapter=True, output_adapter=True, reduction_factor=4, non_linearity="gelu")
model.add_adapter("en", adapter_config)
model.add_adapter("de", adapter_config)
print(model.adapter_summary())

#### print result
================================================================================
Name                     Architecture         #Param      %Param  Active   Train
--------------------------------------------------------------------------------
en                       bottleneck        7,100,928       2.871       0       1
de                       bottleneck        7,100,928       2.871       0       1
--------------------------------------------------------------------------------
Full model                               247,363,386     100.000               1
================================================================================

When trying to leave out all encoder layers, not any adapter is added.

### leave_out first 12 layers of encoder
adapter_config = AdapterConfig(mh_adapter=True, output_adapter=True, reduction_factor=4, non_linearity="gelu", leave_out=list(range(12)))
model.add_adapter("en", adapter_config, overwrite_ok=True)
model.add_adapter("de", adapter_config, overwrite_ok=True)
print(model.adapter_summary())
##### print result
================================================================================
Name                     Architecture         #Param      %Param  Active   Train
--------------------------------------------------------------------------------
en                       bottleneck                0       0.000       0       1
de                       bottleneck                0       0.000       0       1
--------------------------------------------------------------------------------
Full model                               247,363,386     100.000               1
================================================================================

When only leaving out the first 6 layers of the encoder, we see adapters are only added to the encoder, leaving the decoder.

### leave_out first 6 layers of encoder
adapter_config = AdapterConfig(mh_adapter=True, output_adapter=True, reduction_factor=4, non_linearity="gelu", leave_out=list(range(6)))
model.add_adapter("en", adapter_config, overwrite_ok=True)
model.add_adapter("de", adapter_config, overwrite_ok=True)
print(model.adapter_summary())

##### print result
================================================================================
Name                     Architecture         #Param      %Param  Active   Train
--------------------------------------------------------------------------------
en                       bottleneck        3,550,464       1.435       0       1
de                       bottleneck        3,550,464       1.435       0       1
--------------------------------------------------------------------------------
Full model                               247,363,386     100.000               1
================================================================================

#### check parameter
##### print result
print([name for name, p in model.named_parameters() if "adapter" in name])
['encoder.encoder.layer.6.attention.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.6.attention.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.6.attention.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.6.attention.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.6.attention.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.6.attention.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.6.attention.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.6.attention.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.6.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.6.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.6.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.6.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.6.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.6.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.6.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.6.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.7.attention.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.7.attention.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.7.attention.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.7.attention.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.7.attention.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.7.attention.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.7.attention.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.7.attention.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.7.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.7.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.7.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.7.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.7.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.7.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.7.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.7.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.8.attention.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.8.attention.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.8.attention.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.8.attention.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.8.attention.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.8.attention.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.8.attention.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.8.attention.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.8.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.8.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.8.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.8.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.8.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.8.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.8.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.8.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.9.attention.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.9.attention.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.9.attention.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.9.attention.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.9.attention.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.9.attention.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.9.attention.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.9.attention.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.9.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.9.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.9.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.9.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.9.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.9.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.9.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.9.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.10.attention.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.10.attention.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.10.attention.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.10.attention.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.10.attention.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.10.attention.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.10.attention.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.10.attention.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.10.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.10.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.10.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.10.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.10.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.10.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.10.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.10.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.11.attention.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.11.attention.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.11.attention.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.11.attention.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.11.attention.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.11.attention.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.11.attention.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.11.attention.output.adapters.de.adapter_up.bias',
 'encoder.encoder.layer.11.output.adapters.en.adapter_down.0.weight',
 'encoder.encoder.layer.11.output.adapters.en.adapter_down.0.bias',
 'encoder.encoder.layer.11.output.adapters.en.adapter_up.weight',
 'encoder.encoder.layer.11.output.adapters.en.adapter_up.bias',
 'encoder.encoder.layer.11.output.adapters.de.adapter_down.0.weight',
 'encoder.encoder.layer.11.output.adapters.de.adapter_down.0.bias',
 'encoder.encoder.layer.11.output.adapters.de.adapter_up.weight',
 'encoder.encoder.layer.11.output.adapters.de.adapter_up.bias']

Expected behavior

The EncoderDecoderModel class work like BART-like models.

ZeguanXiao commented 1 year ago

Also, it seems EncoderDecoderModelAdaptersMixin.iter_layers should count decoder layer_id starting with len(self.encoder.layers)) like this?

    def iter_layers(self) -> Iterable[Tuple[int, nn.Module]]:
        for i, layer in self.encoder.iter_layers():
            yield i, layer

        encoder_layer_n = len(self.encoder.encoder.layer)
        for i, layer in self.decoder.iter_layers():
            yield i + encoder_layer_n, layer
hSterz commented 1 year ago

Hey @ZeguanXiao, I see why this is unexpected behavior. Unfortunately it is not as easy as changing the iter_layer indices. I will look into this.

ZeguanXiao commented 1 year ago

@hSterz My current workaround is setting model.decoder.base_model.config.adapters = model.encoder.base_model.config.adapters and changing theiter_layer. It seems to work fine.

adapter-hub-bert commented 1 year ago

This issue has been automatically marked as stale because it has been without activity for 90 days. This issue will be closed in 14 days unless you comment or remove the stale label.