MemoryError in computing term similarity

using Termite (with Mallet) for topic model visualisation i encountered two errors:

1- in io_utils.py the str(some string) should be changed (e.g. to unicode(some string)) so that people can work in non-ASCII too.

2- I'm going to visualise more than 300/000 Persian docs topic model and i get this error that indicates my 32GB RAM has no other empty space. The stack trace:

Computing term similarity... data_path = output/example-project sliding_window_size = 10 Connecting to data... Reading data from disk... Computing document co-occurrence... Traceback (most recent call last): File "./execute.py", line 166, in main() File "./execute.py", line 163, in main Execute( logging_level ).execute( corpus_format, corpus_path, model_library, model_path, data_path, num_topics, number_of_seriated_terms ) File "./execute.py", line 88, in execute ComputeSimilarity( self.logger.level ).execute( data_path ) File "/home/.../termite-master/pipeline/compute_similarity.py", line 50, in execute self.computeDocumentCooccurrence() File "/home/.../termite-master/pipeline/compute_similarity.py", line 93, in computeDocumentCooccurrence self.incrementCount( cooccurrence, (aToken, bToken) ) File "/home/.../termite-master/pipeline/compute_similarity.py", line 76, in incrementCount occurrence[ key ] = 1 MemoryError

Running the code again doesn't solve anything. The code consumes all the memory i have and wants more.

StanfordHCI / termite

MemoryError in computing term similarity #23