abenton / wgcca

Python Implementation of Weighted Generalized Canonical Correlation Analysis
MIT License
34 stars 11 forks source link

wgcca

Python Implementation of Weighted Generalized Canonical Correlation Analysis as described in "Learning Multiview Embeddings of Twitter Users". Benton A, Arora R, and Dredze M. ACL 2016.

Tested with

Test suite:

python src/wgccaTest.py

Sample call to learn 5-dimensional WGCCA model (first two views weighted twice as much as second two):

python src/wgcca.py --input resources/sample_wgcca_input.tsv.gz --output wgcca_embeddings.npz --model wgcca_model.pickle --k 5 --kept_views 0 1 2 3 --weights 1.0 1.0 0.5 0.5 --reg 1.e-8 1.e-8 1.e-8 1.e-8    

WeightedGCCA methods

The input views used in "Learning Multiview Embeddings of Twitter Users" can be found at http://www.cs.jhu.edu/~mdredze/datasets/multiview_embeddings/ -- in the same format as resources/sample_wgcca_input.tsv.

If you use this code please cite:

Adrian Benton, Raman Arora, and Mark Dredze. Learning Multiview Representations of Twitter Users. Association for Computational Linguistics (ACL), 2016.

Please contact adrian dot author1_surname at gmail dot com if you have any questions/suggestions/concerns/comments.