Peratham / semanticvectors

Automatically exported from code.google.com/p/semanticvectors
Other
0 stars 0 forks source link

Prog not terminating #54

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

(1) java -cp 
lib/lucene-core-3.5.0.jar:lib/lucene-demo-3.5.0.jar:build/semanticvectors-3.2.ja
r pitt.search.lucene.IndexFilePositions single_urdu_file

-- Builds a directory --> positional_index

(2) java -cp 
lib/lucene-core-3.5.0.jar:lib/lucene-demo-3.5.0.jar:build/semanticvectors-3.2.ja
r pitt.search.semanticvectors.BuildPositionalIndex -dimension 2000 -seedlength 
5 -minfrequency 2 -maxnonalphabetchars 3 -windowradius 2 -positionalmethod 
permutation positional_index/

The second command is running for the last 1000 minutes and takes only 360mb of 
RAM, I guess its not working, pls help me !

Original issue reported on code.google.com by manaal...@gmail.com on 14 Feb 2012 at 7:16

GoogleCodeExporter commented 9 years ago
How big is your single_urdu_file? If it's large can you test this out with a 
much smaller file?

Only using 360mb of RAM doesn't necessarily indicate a problem, because sarse 
elemental vectors and incremental reading from disk is designed to keep the 
memory footprint small. But unless it's a huge huge file I'm surprised at 1000 
minutes.

Original comment by dwidd...@gmail.com on 14 Feb 2012 at 8:55

GoogleCodeExporter commented 9 years ago
My file is only of 350 MB. I'll try it with a small file now, but I guess 350MB 
is pretty small anyway ?

Original comment by manaal...@gmail.com on 14 Feb 2012 at 8:59

GoogleCodeExporter commented 9 years ago
ok, it ran with a smaller corpus to success. I guess it just needs more time !

Original comment by manaal...@gmail.com on 14 Feb 2012 at 9:03

GoogleCodeExporter commented 9 years ago
Another way to speed it up would be to use fewer dimensions. This is a trade 
off between computational performance and semantic performance, of course.

I'm going to mark this as "Done" for now - we'd like things to go faster of 
course, but at least we don't think there's a non-terminating loop causing a 
bug somewhere.

Thanks for your patience!

Original comment by dwidd...@gmail.com on 15 Feb 2012 at 5:40