AMAZING code - few questions

Hi,

I am interested in your thoughts about the objective function, however. Is such a complicated formulation really necessary, or could you just do the dot product of the projections and sum the diagonal -- e.g. Trace(f(X)^T \dot f(Y)) ? Does my question make sense?

As far as I know, no it is not necessary and many people just use dot product of the projections and it works. But they are not the same. It can be close to CCA with just one component but definitely not when we have more than one component (projection). Some may argue that CCA can give better performance because it finds several projections which are orthogonal to each other to map the data to a couple of new feature spaces where two views are highly correlated inside each space and uncorrelated with other feature spaces. I think if your work is not specifically on CCA, at least use dot product for the start because CCA on top of a NN was sometimes unstable in my experiments. Then you can try CCA later.

What does the objective value represent? Is it the sum of the correlations?

It is the sum of the correlations in the new spaces which are learned by the projections. They are not the correlations in the space of the outputs of the NN. You can assume that the outputs of NN are projected into new spaces by a bunch of linear projectors, then correlations are calculated and summed. These linear projections are orthogonal to each other. That's why they apply a linear CCA after the training on the ouput of NN.

Do you get similar results from the MNIST-split dataset (i.e. sum of ~39 corr coeff over 50 components) ??

I did not test that because my focus was on their next paper in which they evaluated in classification tasks. You can easily run it but as I mentioned in the reply to another issue (https://github.com/VahidooX/DeepCCA/issues/1), you may not see the exact numbers.

VahidooX / DeepCCA

AMAZING code - few questions #2