TheDataLeek / Python-LSA

Performing Latent Semantic Analysis with Python on large datasets.
Other
13 stars 7 forks source link

Latent Semantic Analysis in Python

Build Status

In this project we will perform latent semantic analysis of large document sets.

We first create a document term matrix, and then perform SVD decomposition.

This document term matrix uses tf-idf weighting.

To Run! Set your cwd to scripts/ and run the file located there.

Notes to @rrish:

The SVD_using_LSA.m file is a matlab implementation of the latter half of the LSA algorithm once the document-term matrix has been constructed and the SVD has been calculated. It calculated the new word matrix and doc matrix and then takes a query and calculates the cosine distances of the query with each of the documents (columns of the doc matrix, saved into a new array called "docs"). Finally, it ranks the documents according to the relevance to the query word/words.