CUFCTL / face-recognition

A GPU-accelerated real-time face recognition system based on classical machine learning algorithms
MIT License
23 stars 11 forks source link

Improve accuracy of LDA #6

Closed bentsherman closed 8 years ago

bentsherman commented 8 years ago

Our C implementation of LDA appears to be consistent with the MATLAB code, and it even has somewhat higher accuracy (for some reason). However, our C code is still running at only 50-70% accuracy with the ORL database when it should be getting at least 90%. There are two items to address here:

  1. There are a few TODOs left in lda.c that pertain to some optimizations that can be done in LDA. Refer to the LDA papers and see if these optimizations can be implemented.
  2. If these optimizations don't increase accuracy then our MATLAB implementation of LDA might be inaccurate, in which we should look for another LDA implementation.
arlindohall commented 8 years ago

The MATLAB implementation itself is only getting between 5% and 40% accuracy using MATLAB cross validation script. I don't think this is enough to say that any problem with the C code are problems of translation (although there are still fatal bugs in the C code).

Would it be better to continue work on the C or to go through the theory of the MATLAB to be sure it matches the actual algorithm? With this large of a training set (10 images) there's no reason for the LDA code to be performing this poorly.

In the meanwhile while we decide this, I'll be working on the C implementation, but I am worried that it is for naught. Below are the results for running the cross validate script as...

$ ./cross-validate-matlab.sh orl_faces_ppm/ 1 10 --lda | grep matched > lda.log

14 / 40 matched, 35.00% 5 / 40 matched, 12.50% 9 / 40 matched, 22.50% 6 / 40 matched, 15.00% 5 / 40 matched, 12.50% 7 / 40 matched, 17.50% 7 / 40 matched, 17.50% 7 / 40 matched, 17.50% 9 / 40 matched, 22.50% 2 / 40 matched, 5.00%

ctargon commented 8 years ago

It sounds to me like our best option is one of the two:

  1. to systematically go through the LDA MATLAB code to understand what is fundamentally wrong, because there is definitely something crucial that is incorrect
  2. similar to ICA, we could search for a new implementation in MATLAB and see if we can get better results.

I think the translation to C is easier than figuring out what's wrong with the LDA algorithm. If we have all our MATLAB algorithms working, we know that the theory is correct and it is our fault the C won't work

arlindohall commented 8 years ago

I'll get started going through the MATLAB code, comparing it to the algorithm as described in the papers then. If it's not an obvious error I'll start looking for the source we have, maybe to compare and see if some error crept in over the years, or for a replacement.

On Thu, Sep 8, 2016 at 3:07 PM, ctargon notifications@github.com wrote:

It sounds to me like our best option is one of the two:

  1. to systematically go through the LDA MATLAB code to understand what is fundamentally wrong, because there is definitely something crucial that is incorrect
  2. similar to ICA, we could search for a new implementation in MATLAB and see if we can get better results.

I think the translation to C is easier than figuring out what's wrong with the LDA algorithm. If we have all our MATLAB algorithms working, we know that the theory is correct and it is our fault the C won't work

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/CUFCTFACE/face-recognition/issues/6#issuecomment-245706304, or mute the thread https://github.com/notifications/unsubscribe-auth/AFr9PrzKC4n2WmSUtDH1DRaVe_-NZeNAks5qoF0FgaJpZM4JyKmm .

Miller Arlindo Hall Computer Engineering Clemson University 2016

ctargon commented 8 years ago

I think that's your best bet. if the results are this poor, I feel like there is going to be something noticeable missing. I am sure we can find an implementation if you are unable to fix LDA that way

jtetrea commented 8 years ago

I’ll be around in the lab tomorrow after Palmetto tour to talk about this if needed. Miller (or whoever is going to go through Matlab) if you wanted to print out LDA paper (or just keep open on laptop - either way) then we can focus on stepping through Matlab code together and verifying it against the paper. If that time doesn’t work and you feel my help would be beneficial we can meet Monday sometime. Hopefully we don’t need a whole new implementation and can find error.

Jesse Tetreault CpE Graduate Student Future Computing Technologies Laboratory Clemson University 803-331-6152

On Sep 8, 2016, at 3:11 PM, ctargon notifications@github.com wrote:

I think that's your best bet. if the results are this poor, I feel like there is going to be something noticeable missing. I am sure we can find an implementation if you are unable to fix LDA that way

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_CUFCTFACE_face-2Drecognition_issues_6-23issuecomment-2D245707337&d=CwMFaQ&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=u-xRtTYnQEO21R3Vy_DTAZ7YuAKrT-A8abkagwQMAGc&m=fD_0BBhaigkLK5Zq3XEDgZYEvF6g9dpck-ihmQDoR3M&s=L7-iJtKU9ZIPwCxaE2E8Ak8ZOeMGkzM9hDCJ6zKAelI&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AGSsKMiXm5x17hn6FkOGrnBziaBeGM-2Dgks5qoF3VgaJpZM4JyKmm&d=CwMFaQ&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=u-xRtTYnQEO21R3Vy_DTAZ7YuAKrT-A8abkagwQMAGc&m=fD_0BBhaigkLK5Zq3XEDgZYEvF6g9dpck-ihmQDoR3M&s=ECQCa-AarFHCRVKrnGNMvxQuprruoTyfT_Ea9Jz-jT8&e=.

arlindohall commented 8 years ago

I have been going through the MATLAB code, and I found that the code we had was eliminating the largest eigenvectors rather than the smallest. A simple fliplr() call fixed this problem and greatly increased the accuracy of the LDA code. The results now are:

20 / 40 matched, 50.00% 12 / 40 matched, 30.00% 10 / 40 matched, 25.00% 28 / 40 matched, 70.00% 28 / 40 matched, 70.00% 29 / 40 matched, 72.50% 18 / 40 matched, 45.00% 13 / 40 matched, 32.50% 26 / 40 matched, 65.00% 23 / 40 matched, 57.50%

Which isn't stellar, but it's an order of magnitude better than random guessing, so I would consider it working. I'll continue on the C code as planned, being sure to avoid the same problem there.

On Thu, Sep 8, 2016 at 3:13 PM, jtetrea notifications@github.com wrote:

I’ll be around in the lab tomorrow after Palmetto tour to talk about this if needed. Miller (or whoever is going to go through Matlab) if you wanted to print out LDA paper (or just keep open on laptop - either way) then we can focus on stepping through Matlab code together and verifying it against the paper. If that time doesn’t work and you feel my help would be beneficial we can meet Monday sometime. Hopefully we don’t need a whole new implementation and can find error.

Jesse Tetreault CpE Graduate Student Future Computing Technologies Laboratory Clemson University 803-331-6152

On Sep 8, 2016, at 3:11 PM, ctargon notifications@github.com wrote:

I think that's your best bet. if the results are this poor, I feel like there is going to be something noticeable missing. I am sure we can find an implementation if you are unable to fix LDA that way

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://urldefense. proofpoint.com/v2/url?u=https-3A__github.com_CUFCTFACE_face- 2Drecognition_issues_6-23issuecomment-2D245707337&d=CwMFaQ&c=Ngd- ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=u-xRtTYnQEO21R3Vy_ DTAZ7YuAKrT-A8abkagwQMAGc&m=fD_0BBhaigkLK5Zq3XEDgZYEvF6g9dpck -ihmQDoR3M&s=L7-iJtKU9ZIPwCxaE2E8Ak8ZOeMGkzM9hDCJ6zKAelI&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https- 3A__github.com_notifications_unsubscribe-2Dauth_ AGSsKMiXm5x17hn6FkOGrnBziaBeGM-2Dgks5qoF3VgaJpZM4JyKmm&d=CwMFaQ&c=Ngd- ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=u-xRtTYnQEO21R3Vy_ DTAZ7YuAKrT-A8abkagwQMAGc&m=fD_0BBhaigkLK5Zq3XEDgZYEvF6g9dpck -ihmQDoR3M&s=ECQCa-AarFHCRVKrnGNMvxQuprruoTyfT_Ea9Jz-jT8&e=.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/CUFCTFACE/face-recognition/issues/6#issuecomment-245708007, or mute the thread https://github.com/notifications/unsubscribe-auth/AFr9PisJ6RoHkqNq7w4N7FMTN7ya0-KLks5qoF5lgaJpZM4JyKmm .

Miller Arlindo Hall Computer Engineering Clemson University 2016

bentsherman commented 8 years ago

Good catch @arlindohall, those new results look very similar to what I got from the C implementation. You should run the C code and see if you get identical results. I got identical results with PCA so if LDA matches too then I think we can be confident that our C implementation is consistent and rule out errors from translation.

arlindohall commented 8 years ago

I'm making a new issue to re-phrase the current work to be done on LDA