dileepajayakody / semanticvectors

Automatically exported from code.google.com/p/semanticvectors
Other
1 stars 0 forks source link

Term - term similarity matirx #78

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. I want to know whether there is any command to get term-term similarity 
matrix rather than the vectors themselves using random indexing?
2.
3.

What is the expected output? What do you see instead?

What version of the product are you using? On what operating system?
I am using "semanticvectors-4.0 "  on ubuntu 12.04

Please provide any additional information below.

Original issue reported on code.google.com by rohitdee...@gmail.com on 6 Mar 2014 at 11:12

GoogleCodeExporter commented 9 years ago
The simplest way is just to write an n-squared loop over a vector store in RAM 
to give the pairwise similarities.

The problem is usually space - 10000 vectors of (say) 250 real dimensions 
expressed as 4-byte floats takes 10000 * 1000 = 10 MB, whereas pairwise 
similarities for this many would be 400MB. Naturally you could try to optimize 
space consumption by discarding small values and using a sparse matrix 
representation.

Original comment by dwidd...@gmail.com on 6 Mar 2014 at 5:20

GoogleCodeExporter commented 9 years ago
I'm going to close this for now, pending a clearer specification of what we 
mean by a matrix. (It's clear as a mathematical abstraction but not clear as an 
output format specification, there are many options and some would not scale 
well.)

Original comment by dwidd...@gmail.com on 19 Nov 2014 at 7:37