VahidooX / DeepCCA

An implementation of Deep Canonical Correlation Analysis (DCCA or Deep CCA) with Keras.
MIT License
246 stars 77 forks source link

a few quastions #1

Open sdahan12 opened 7 years ago

sdahan12 commented 7 years ago

hi, thank you for this implementation...

i was wondering - i saw that in your code you took 10 dimensions for the mnist, while in the paper they took 50 dimensions

what do you think that the val will be for 10 dimensions ?

another question- i saw that the loss is calculated with (-corr) that is for 10 dimensions and it is just the same as validation- for me it converge to 3 - i think it is not as expected - what do you think?

and for last, i had a problem using the function T.ones, it raised an exception for - it cant used with itter, so i changed it to T.ones_like and it worked - do you think this is the right implementation?

thanks,

VahidooX commented 7 years ago

There was a bug that it is fixed by now. Download the code again. It should be working now. I am not sure how you have used T.ones_like, but I solved it with T.ones.

In the original deep cca paper, it was 50 but they did not evaluate it on a classification task. In their next paper where they tested deep cca on classification tasks, they used 10 for the output dimension. That's why I used 10. I implemented it two years ago. As I remember, 10 gave a significantly better accuracy for noisy MNIST. Higher output dimension for an easy dataset with 10 classes like MNIST may hurt the performance. Here is the paper: http://proceedings.mlr.press/v37/wangb15.pdf

The value of the correlation loss in my implementation should not necessarily be exactly as the ones reported in the original paper, because they originally implemented it in C++ and implemented the gradient of the loss manually. But I have used the symbolic differentiation capability of Theano. So the values of correlations given by the code is not comparable with the other implementations easily. There are also some other small differences like the activation function and pre-training. It is why I think it is not gonna help you even if you know the value of loss for 10 in the original implementation. Anyway, you you should be able to run the original code with 50 to see the results. Why do you think that loss with value 3 is not expected?