fozziethebeat / S-Space

The S-Space repsitory, from the AIrhead-Research group
GNU General Public License v2.0
205 stars 106 forks source link

NumberFormatException when attempting to run LatentSymanticAnalysis class #59

Open kelvinAI opened 9 years ago

kelvinAI commented 9 years ago

Hi, I'm facing an error while calling the LSA class as a library. The error was thrown during the processSpace() call.

void initialize() throws IOException {
    //....
    LatentSemanticAnalysis lsa = new LatentSemanticAnalysis(3);

    File input = new File("data/input2.txt");

    BufferedReader br = new BufferedReader(new FileReader(input));

    lsa.processDocument(br);

    lsa.processSpace(System.getProperties()); // <--- Error happens within this method

System Output: Initializing MyLSAmain Saving matrix using edu.ucla.sspace.matrix.SvdlibcSparseBinaryMatrixBuilder@60e53b93 Saw 19 terms, 8 unique edu.ucla.sspace.lsa.LatentSemanticAnalysis@7adf9f5f processing doc edu.ucla.sspace.util.SparseIntHashArray@85ede7b Jan 26, 2015 2:03:01 PM edu.ucla.sspace.common.GenericTermDocumentVectorSpace processSpace INFO: performing log-entropy transform Jan 26, 2015 2:03:01 PM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform INFO: Computing the total row counts Jan 26, 2015 2:03:01 PM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform INFO: Computing the entropy of each row Jan 26, 2015 2:03:01 PM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform INFO: Scaling the entropy of the rows Jan 26, 2015 2:03:01 PM edu.ucla.sspace.lsa.LatentSemanticAnalysis processSpace INFO: reducing to 3 dimensions Exception in thread "main" java.lang.NumberFormatException: For input string: "nan" at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043) at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110) at java.lang.Double.parseDouble(Double.java:538) at edu.ucla.sspace.matrix.MatrixIO.readDenseSVDLIBCtext(MatrixIO.java:994) at edu.ucla.sspace.matrix.MatrixIO.readMatrix(MatrixIO.java:809) at edu.ucla.sspace.matrix.MatrixIO.readMatrix(MatrixIO.java:762) at edu.ucla.sspace.matrix.factorization.SingularValueDecompositionLibC.factorize(SingularValueDecompositionLibC.java:153) at edu.ucla.sspace.lsa.LatentSemanticAnalysis.processSpace(LatentSemanticAnalysis.java:463) at edu.ucla.sspace.mains.MyMain.initialize(MyMain.java:62) at edu.ucla.sspace.mains.MyMain.(MyMain.java:23) at edu.ucla.sspace.mains.MyMain.main(MyMain.java:33)

This is a follow up to #58 where I've managed to run LSAMain successfully. Am i missing something? Thanks

davidjurgens commented 9 years ago

Could you please paste in the stack trace so we can see where in the LSA code is throwing the exception?

On Mon, Jan 26, 2015 at 1:18 AM, fingorn notifications@github.com wrote:

void initialize() throws IOException { //..... LatentSemanticAnalysis lsa = new LatentSemanticAnalysis(3);

File input = new File("data/input2.txt");

BufferedReader br = new BufferedReader(new FileReader(input));

lsa.processDocument(br);

lsa.processSpace(System.getProperties());

//.... }

— Reply to this email directly or view it on GitHub https://github.com/fozziethebeat/S-Space/issues/59.

kelvinAI commented 9 years ago

I updated the issue on github but apparently it wasn't send out through email. Could you please check it out on github? On Jan 27, 2015 12:10 AM, "David Jurgens" notifications@github.com wrote:

Could you please paste in the stack trace so we can see where in the LSA code is throwing the exception?

On Mon, Jan 26, 2015 at 1:18 AM, fingorn notifications@github.com wrote:

void initialize() throws IOException { //..... LatentSemanticAnalysis lsa = new LatentSemanticAnalysis(3);

File input = new File("data/input2.txt");

BufferedReader br = new BufferedReader(new FileReader(input));

lsa.processDocument(br);

lsa.processSpace(System.getProperties());

//.... }

— Reply to this email directly or view it on GitHub https://github.com/fozziethebeat/S-Space/issues/59.

— Reply to this email directly or view it on GitHub https://github.com/fozziethebeat/S-Space/issues/59#issuecomment-71485575 .

davidjurgens commented 9 years ago

Ok, I've pushed a change that should fix this behavior. However, I just want to point out that you're seeing this error only because you're passing in an extremely small matrix to the SVD which is causing it to hit some degenerate case and produced NaN values. If you can, I would really recommend expanding your testing to using a larger corpus with more than three documents and eight terms. (Though things should "just work" regardless ;) )