Improve accuracy of ICA C implementation

ctargon commented 8 years ago

ICA C implementation is compiling, running, and producing accuracies between ~25%-70%. There are plenty of optimizations that can be incorporated and perhaps a missing line or two from the Matlab. Could use another pair of eyes to go through it and see if I missed anything.

bentsherman commented 8 years ago

Some notes on what Colin and I did today:

We implemented the formula for icasig in the MATLAB code because the C code didn't implement the whole formula; however, we did not see an increase in accuracy in the test run that we used (1-5 removed from each class, 27.5% in both cases)
The MATLAB ICA code is actually calling remmean twice: once in run_ica, in which it removes the mean image, and once again in fastica, in which it removes the mean "row" rather than the mean image because mixedsig = X'

Some things that I plan to do once I have time:

move matrix functions from ica.c to the matrix library
remove memory leaks
clean up the code, uniform naming conventions, etc
combine temp_PCA with the main PCA function

Once I have done my clean-up, we should run cross-validation so that we can see how the accuracy of the C code changes as the training set shrinks.

Meanwhile, it might be useful for some people to play around with the parameters in the MATLAB ICA code that were not included in the C code and document their findings so that we know which parameters are worth keeping.

bentsherman commented 7 years ago

Okay, after a few debug sessions throughout last night and today, I was able to fix the issue we had with NaNs in our matrices. A few points:

The PCA_alt function in ica.c was returning an extra eigenvalue with value 0, because an N-by-N symmetric matrix has at most N-1 eigenvalues greater than 0. The eigenvalues are later diagonalized and inverted, which introduced some very large numbers. I changed PCA_alt to remove the extra eigenvalue, but I may modify m_eigen in the future to automatically remove these zero columns. Anyway, this change got rid of the NaNs.
In the MATLAB version of fpica, the inner loop actually breaks when the w vector converges, but this break statement was never added to the C code because the MATLAB indentation is so bad. Anyway, I added the break and fixed up the convergence code so our C code is now considerably faster.

In summary, the result of my changes is that the ICA C code seems to produce very similar results to the MATLAB code, including intermediate values. However, even with the break statement, the C code is still significantly slower. I would assume that many of the operations in the MATLAB code are multi-threaded, but there may be other differences. We'll need to add timing information to the ICA code so that we can identify any bottlenecks.

bentsherman commented 7 years ago

Converting the matrix library to single precision seems to have improved the performance of ICA somewhat.

ctargon commented 7 years ago

Accuracy is on par with Matlab

CUFCTL / face-recognition

Improve accuracy of ICA C implementation #16