apostolikas / Language-Specific-Subnetworks

This repo contains investigates the cross-lingual sharing mechanism of multilingual models through their subnetworks
MIT License
0 stars 1 forks source link

Investigating the cross-lingual sharing mechanism of multilingual models through their subnetworks

stitching_results

It has been shown that different, equally good subnetworks exist in a transformer model after fine-tuning and pruning it [1]. In this project, we investigate the similarity of unilingual subnetworks, obtained by structured-pruned multilingual transformer models. By comparing subnetworks based on (i) mask similarity, (ii) representation similarity, and (iii) functional similarity, we demonstrate that unilingual subnetworks can effectively solve the same task in different languages and solve other tasks in early layers, even with shuffled masks. However, the last layers of the subnetworks are task-specific and cannot generalize to other tasks. Our research also provides insight into mutual information shared between cross-lingual subnetworks.

Nikolaos Apostolikas, Gergely Papp, Panagiotis Tsakas, Vasileios Vythoulkas
March 2023


This is the official repository of Investigating the cross-lingual sharing mechanism of multilingual models through their subnetworks. Please find instructions to reproduce the results below.

 

Preparation

Download the repository

Download this repository.
cd Language-Specific-Subnetworks

Install environment

Conda:

conda env create -f env.yml
conda activate atcs

Pip:

pip install -r requirements.txt

Download finetuned models and masks

Either run our provided downloading script:

source download.sh

Or download manually by

Experiments

Jaccard

python plot/load_masks.py

CKA

Syntax:python cka.py model1 model2 mask1 mask2
Example:

python cka.py results/models/marc/best results/models/paws-x/best results/pruned_masks/marc/zh_0.pkl results/pruned_masks/paws-x/zh_0.pkl

This script saves results under results/cka folder.

Stitching

Syntax:python stitch.py model1 model2 mask1 mask2 layer_index target_dataset target_lang
Example:

python stitch.py results/models/marc/best results/models/marc/best results/pruned_masks/marc/en_0.pkl results/pruned_masks/marc/en_0.pkl 6 marc en

This script saves results under results/stitch folder, in a csv.

Plotting

Some of the plots can be found under plot/notebooks/.
The t-SNE plot can be made via

python plot/head_importance_analysis.py

And the mask-overlap via

python plot/stats_script.py

Finetuning + masking

In order to finetune a model on ROBERTa, run:

python finetune.py [xnli|paws-x|marc|wikiann]

Then, you can create the masks for the subnetworks with

python mask.py results/models/YOUR_MODEL/best --seed 0

This will script will save the 5 masks for 5 languages for the given finetuned model.

 

 

References

[1] Sai Prasanna, Anna Rogers, and Anna Rumshisky. 2020. When bert plays the lottery, all tickets are winning. arXiv preprint arXiv:2005.00561.

[2] Simon Kornblith, Mohammad Norouzi, Honglak Lee, and Geoffrey Hinton. 2019. Similarity of neural network representations revisited. In Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 3519–3529. PMLR.

[3] Adrián Csiszárik, Péter K ̋orösi-Szabó, Ákos Matszangosz, Gergely Papp, and Dániel Varga. 2021. Similarity and matching of neural network representations. In Advances in Neural Information Processing Systems, volume 34, pages 5656–5668. Curran Associates, Inc.

[4] Yamini Bansal, Preetum Nakkiran, and Boaz Barak. 2021. Revisiting model stitching to compare neural representations. In Advances in Neural Information Processing Systems, volume 34, pages 225–236. Curran Associates, Inc.