StanfordHCI / termite

(development moved to new repos)
BSD 3-Clause "New" or "Revised" License
115 stars 36 forks source link

MemoryError in computing term similarity #23

Closed afshinrahimi closed 10 years ago

afshinrahimi commented 11 years ago

using Termite (with Mallet) for topic model visualisation i encountered two errors:

1- in io_utils.py the str(some string) should be changed (e.g. to unicode(some string)) so that people can work in non-ASCII too.

2- I'm going to visualise more than 300/000 Persian docs topic model and i get this error that indicates my 32GB RAM has no other empty space. The stack trace:

Computing term similarity... data_path = output/example-project sliding_window_size = 10 Connecting to data... Reading data from disk... Computing document co-occurrence... Traceback (most recent call last): File "./execute.py", line 166, in main() File "./execute.py", line 163, in main Execute( logging_level ).execute( corpus_format, corpus_path, model_library, model_path, data_path, num_topics, number_of_seriated_terms ) File "./execute.py", line 88, in execute ComputeSimilarity( self.logger.level ).execute( data_path ) File "/home/.../termite-master/pipeline/compute_similarity.py", line 50, in execute self.computeDocumentCooccurrence() File "/home/.../termite-master/pipeline/compute_similarity.py", line 93, in computeDocumentCooccurrence self.incrementCount( cooccurrence, (aToken, bToken) ) File "/home/.../termite-master/pipeline/compute_similarity.py", line 76, in incrementCount occurrence[ key ] = 1 MemoryError

Running the code again doesn't solve anything. The code consumes all the memory i have and wants more.

jcchuang commented 10 years ago

Hi,

Thanks for the bug report. I've since split Termite into two components, Termite Data Server and Termite Visualizations, and so am no longer maintaining this repository.

Please see README.md about the current development.

Thanks for checking out Termite!