fozziethebeat / S-Space

The S-Space repsitory, from the AIrhead-Research group
GNU General Public License v2.0
205 stars 106 forks source link

Incorrect application of transform in LSA projection #52

Closed geospith closed 10 years ago

geospith commented 10 years ago

When you project a document to the LSA space, the original transform is not applied (the "transformed" variable is unused):

https://github.com/fozziethebeat/S-Space/blob/872aab010143509f1cf4d90ba5ce5225a121de36/src/main/java/edu/ucla/sspace/lsa/LatentSemanticAnalysis.java#L522-L527

davidjurgens commented 10 years ago

This is definitely a bug! Thanks for spotting this. Out of curiosity, if you fixed it on your side, did the lack of transformation end up dramatically affecting the projected vector's quality or usefulness in your applications?

I'll fix this bug now and push it to master.

geospith commented 10 years ago

The results were still quite useful (been using it for document retrieval). However, it would fail some basic tests, namely that the retained document vector and the projection of the same document should be identical. Their cosine similarity would still be quite high (~0.9). After the fix and once I rescale with the sigma values (you might want to provide such functionality to make the projected and retained vectors directly comparable) I get cosine similarity 1.0.