CPJKU / deep_lda

Implementation of Deep Linear Discriminant Analysis (DeepLDA)
48 stars 26 forks source link

the multi classification problem #2

Open zorrocai opened 6 years ago

zorrocai commented 6 years ago

I have wrote the deeplda in pytorch. When it comes to test moudle, classification method confused me a lot. I want to use one to rest method in mulit classification. It seems that your classification is a new interesting method. Could you explain more about it, and what is the theory behind it?

zorrocai commented 6 years ago

Are your per-class mean hidden representations based on batch size samples? so is your accuracy?

dmatte commented 6 years ago

I have wrote the deeplda in pytorch.

Thats great! I have seen that you also opened an issue for getting the gradients of the eigenvalue decomposition in pytorch. Hopefully this is available soon.

When it comes to test moudle, classification method confused me a lot. I want to use one to rest method in mulit classification. It seems that your classification is a new interesting method. Could you explain more about it, and what is the theory behind it?

One-to-rest is not required as we build on top of Multiclass LDA.

Computing the individual class probabilities itself is also nothing new. In fact, it is identical to what you do in the classical (none-deep) version of LDA. You can also find this in the sklearn implementation of LDA which in turn builds on sklearn's LinearClassifierMixin.

Are your per-class mean hidden representations based on batch size samples? so is your accuracy?

For computing the updates we use batch-statistics (you could also use running averages similar to what is done for example in batch-normalization). For evaluating and testing the statistics are re-computed on the entire training set to get more reliable estimates for both, means and covariances.

zorrocai commented 6 years ago

Thanks for your guides. I have prepared the IELTS test those days. Actually, instead of using the eigenvalue decomposition in LDA first, I tried to learn the projection matrix A directly, with back propagation. And it seems that the results were just good too.

zorrocai commented 6 years ago

I have trained the AlexNet with lda-eigenvalues loss.But here wasn't any good improvement on train accuracy. The sample outputs of my code listed below:

epoch 13 avg_train_loss: -0.999989 ('LDA-Eigenvalues (Train):', '[0. 0. 0. 0. 0. 0. 0. 0. 0.]') Ratio min(eigval)/max(eigval): 0.005, Mean(eigvals): 0.000 train accuracy: 12.085143 epoch 14 avg_train_loss: -0.999989 ('LDA-Eigenvalues (Train):', '[0. 0. 0. 0. 0. 0. 0. 0. 0.]') Ratio min(eigval)/max(eigval): 0.006, Mean(eigvals): 0.000 train accuracy: 12.023333 epoch 15 avg_train_loss: -0.999990 ('LDA-Eigenvalues (Train):', '[0. 0. 0. 0. 0. 0. 0. 0. 0.]') Ratio min(eigval)/max(eigval): 0.007, Mean(eigvals): 0.000 train accuracy: 11.959375

I wonder that the poor performance may caused by the bad LDA-Eigenvalues? or may there have some other tricks?

zorrocai commented 6 years ago

Actually, I run into such a situation: S_B/S_W becomes a minus identity matrix druing training.So the eigenvalues are all 1. this condition may prevent further training. image I really don't konw the reason...

dmatte commented 6 years ago

Did you try this with our theano version or with your pytorch implementation?

Did you train AlexNet on ImageNet? This woun't work as you have 1000 classes. This implies that your co-variances in the objective function have a size of 1000 x 1000. You would need a huge mini-batch-size (1001 * 1000) to get stable estimates for the covariance matrices and the model would not fit into your GPU memory.

zorrocai commented 6 years ago

In my pytorch implementation, and just train on MNIST dataset.

zorrocai commented 6 years ago

I found that the wrong position of line S_W += lambdaI caused the problem. I added this line before S_B=S_T - S_W. After I moved S_W +=lambdaI behind S_B=S_T - S_W , the problem has gone. But it still has poor eigenvalues after one epoch like this: ('LDA-Eigenvalues (Train):', '[-0.01 -0.01 -0.01 -0.01 -0.01 -0.01 -0.01 -0.01 1.68]') Ratio min(eigval)/max(eigval): -0.007, Mean(eigvals): 0.176

Unlike your theano codes could have much bigger eigenvalues: LDA-Eigenvalues (Train): [ 5.83 7.17 7.45 8.01 8.67 11.22 11.81 14.82 18.66] Ratio min(eigval)/max(eigval): 0.312, Mean(eigvals): 10.403

xuanhanyu commented 4 years ago

Thanks for your guides. I have prepared the IELTS test those days. Actually, instead of using the eigenvalue decomposition in LDA first, I tried to learn the projection matrix A directly, with back propagation. And it seems that the results were just good too.

Eigenvalue decomposition is Non-differentiable in pytorch. what should I do?

webzerg commented 4 years ago

@zorrocai do you mind share your pytorch implementation code? Thanks

zjyLibra commented 3 years ago

Why theano can’t call the GPU on the server and tried many ways to configure the environment?I want your pytorch implementation code too,Thanks,Can you share it? @zorrocai