In this project we will perform latent semantic analysis of large document sets.
We first create a document term matrix, and then perform SVD decomposition.
This document term matrix uses tf-idf weighting.
To Run! Set your cwd
to scripts/
and run the file located there.
Notes to @rrish:
WORKERS
variable sets how many worker processes to create.
Feel free to play around for performance. (I haven't yet)The SVD_using_LSA.m file is a matlab implementation of the latter half of the LSA algorithm once the document-term matrix has been constructed and the SVD has been calculated. It calculated the new word matrix and doc matrix and then takes a query and calculates the cosine distances of the query with each of the documents (columns of the doc matrix, saved into a new array called "docs"). Finally, it ranks the documents according to the relevance to the query word/words.