ComparingPrincipleComponentAnalysisEJML.java of revision01 to PrincipleComponentAnalysis.java in rc01 seems to have a superfluous samples field in revision01. This leads to storing the entire genomic signature information again and can thus cause problems with the Java VM heap size, basically ending in an Out-of-memory error, e.g. for -Xmx3g and around 88k points.
The samples field is currently not used in PrincipleComponentAnalysisEJML.java as only double[] sampleToEigenSpace(double[] sampleData); is called but not double[] sampleToEigenSpace(int sample); which actually makes use of the samples field. Should we want to use this functionality, probably getting A[i] and adding the mean[] is better as it saves quite a bit of memory then.
Removed this field in in 9ded686ca87a55cc0c16d4c0d0b90ab053f7b885
This allows Mtj to perform faster and with less memory than EJML now. The results are only slightly different:
Comparing
PrincipleComponentAnalysisEJML.java
ofrevision01
toPrincipleComponentAnalysis.java
inrc01
seems to have a superfluous samples field inrevision01
. This leads to storing the entire genomic signature information again and can thus cause problems with the Java VM heap size, basically ending in anOut-of-memory
error, e.g. for-Xmx3g
and around 88k points.The
samples
field is currently not used inPrincipleComponentAnalysisEJML.java
as onlydouble[] sampleToEigenSpace(double[] sampleData);
is called but notdouble[] sampleToEigenSpace(int sample);
which actually makes use of thesamples
field. Should we want to use this functionality, probably gettingA[i]
and adding themean[]
is better as it saves quite a bit of memory then.