Closed kelvinAI closed 9 years ago
I think the issue is that the input is only three documents but the command is trying to reduce the dimensionally to 300, which isn't possible (there's not enough data). If you tried to either reduce to two dimensions or to increase the number of terms/documents in the input corpus, the command should work.
Thanks, David
On Sat, Jan 24, 2015 at 12:46 PM, fingorn notifications@github.com wrote:
Hi, I'm getting the above error when running LSAMain with the following commands: -d data/input2.txt data/output/my_lsa_output.sspace
input2.txt is just a very simple text file (for testing) and it contains: The man walked the dog. The man took the dog to the park. The dog went to the park.
System output: Saving matrix using edu.ucla.sspace.matrix.SvdlibcSparseBinaryMatrixBuilder@5e2de80 https://github.com/edu.ucla.sspace.matrix.SvdlibcSparseBinaryMatrixBuilder/S-Space/commit/5e2de80c Saw 8 terms, 7 unique Saw 5 terms, 5 unique Saw 6 terms, 6 unique edu.ucla.sspace.lsa.LatentSemanticAnalysis@406a31d https://github.com/edu.ucla.sspace.lsa.LatentSemanticAnalysis/S-Space/commit/406a31db processing doc edu.ucla.sspace.util.SparseIntHashArray@2fae8f9 https://github.com/edu.ucla.sspace.util.SparseIntHashArray/S-Space/commit/2fae8f9 edu.ucla.sspace.lsa.LatentSemanticAnalysis@406a31d https://github.com/edu.ucla.sspace.lsa.LatentSemanticAnalysis/S-Space/commit/406a31db processing doc edu.ucla.sspace.util.SparseIntHashArray@3553305 https://github.com/edu.ucla.sspace.util.SparseIntHashArray/S-Space/commit/3553305b edu.ucla.sspace.lsa.LatentSemanticAnalysis@406a31d https://github.com/edu.ucla.sspace.lsa.LatentSemanticAnalysis/S-Space/commit/406a31db processing doc edu.ucla.sspace.util.SparseIntHashArray@390b4f5 https://github.com/edu.ucla.sspace.util.SparseIntHashArray/S-Space/commit/390b4f54 Jan 25, 2015 1:33:24 AM edu.ucla.sspace.common.GenericTermDocumentVectorSpace processSpace INFO: performing log-entropy transform Jan 25, 2015 1:33:24 AM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform INFO: Computing the total row counts Jan 25, 2015 1:33:24 AM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform INFO: Computing the entropy of each row Jan 25, 2015 1:33:24 AM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform INFO: Scaling the entropy of the rows Jan 25, 2015 1:33:24 AM edu.ucla.sspace.lsa.LatentSemanticAnalysis processSpace INFO: reducing to 300 dimensions Exception in thread "main" java.lang.RuntimeException: SVDLIBC generated the incorrect number of dimensions: 3 versus 300 at edu.ucla.sspace.matrix.factorization.SingularValueDecompositionLibC.readSVDLIBCsingularVector(SingularValueDecompositionLibC.java:198) at edu.ucla.sspace.matrix.factorization.SingularValueDecompositionLibC.factorize(SingularValueDecompositionLibC.java:161) at edu.ucla.sspace.lsa.LatentSemanticAnalysis.processSpace(LatentSemanticAnalysis.java:463) at edu.ucla.sspace.mains.GenericMain.processDocumentsAndSpace(GenericMain.java:514) at edu.ucla.sspace.mains.GenericMain.run(GenericMain.java:443) at edu.ucla.sspace.mains.LSAMain.main(LSAMain.java:167)
FYI the environment setup is : 64-bit Windows 7 , svdlibc compiled with cygwin. Is this issue caused by the input file? I've tried using a wiki dump corpus however the issue still exists. Any help is greatly appreciated.
Thank You
— Reply to this email directly or view it on GitHub https://github.com/fozziethebeat/S-Space/issues/58.
Reducing the number of dimensions to 2 solved the issue of the small input corpus. Thank you.
Hi, I'm getting the above error when running LSAMain with the following commands: -d data/input2.txt data/output/my_lsa_output.sspace
input2.txt is just a very simple text file (for testing) and it contains: The man walked the dog. The man took the dog to the park. The dog went to the park.
System output: Saving matrix using edu.ucla.sspace.matrix.SvdlibcSparseBinaryMatrixBuilder@5e2de80c Saw 8 terms, 7 unique Saw 5 terms, 5 unique Saw 6 terms, 6 unique edu.ucla.sspace.lsa.LatentSemanticAnalysis@406a31db processing doc edu.ucla.sspace.util.SparseIntHashArray@2fae8f9 edu.ucla.sspace.lsa.LatentSemanticAnalysis@406a31db processing doc edu.ucla.sspace.util.SparseIntHashArray@3553305b edu.ucla.sspace.lsa.LatentSemanticAnalysis@406a31db processing doc edu.ucla.sspace.util.SparseIntHashArray@390b4f54 Jan 25, 2015 1:33:24 AM edu.ucla.sspace.common.GenericTermDocumentVectorSpace processSpace INFO: performing log-entropy transform Jan 25, 2015 1:33:24 AM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Computing the total row counts
Jan 25, 2015 1:33:24 AM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Computing the entropy of each row
Jan 25, 2015 1:33:24 AM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Scaling the entropy of the rows
Jan 25, 2015 1:33:24 AM edu.ucla.sspace.lsa.LatentSemanticAnalysis processSpace
INFO: reducing to 300 dimensions
Exception in thread "main" java.lang.RuntimeException: SVDLIBC generated the incorrect number of dimensions: 3 versus 300
at edu.ucla.sspace.matrix.factorization.SingularValueDecompositionLibC.readSVDLIBCsingularVector(SingularValueDecompositionLibC.java:198)
at edu.ucla.sspace.matrix.factorization.SingularValueDecompositionLibC.factorize(SingularValueDecompositionLibC.java:161)
at edu.ucla.sspace.lsa.LatentSemanticAnalysis.processSpace(LatentSemanticAnalysis.java:463)
at edu.ucla.sspace.mains.GenericMain.processDocumentsAndSpace(GenericMain.java:514)
at edu.ucla.sspace.mains.GenericMain.run(GenericMain.java:443)
at edu.ucla.sspace.mains.LSAMain.main(LSAMain.java:167)
FYI the environment setup is : 64-bit Windows 7 , svdlibc compiled with cygwin. Is this issue caused by the input file? I've tried using a wiki dump corpus however the issue still exists. Any help is greatly appreciated.
Thank You