Closed ptamas88 closed 6 years ago
I solved this issue by changing np.vstack
function to a primitive implementation:
replaced this:
self.wordvecs = np.vstack((self.wordvecs, new_wv))
with this:
wv_new = []
for i in xrange(len(self.wordvecs)):
wv_new.append(self.wordvecs[i])
wv_new.append(new_wv)
self.wordvecs = wv_new
Hi!
1.) I changed your wordvec.txt to a new one (trained with google wordvec, and 2.5 M tokens, 100 features). 2.) Changed the hardcoded 300 values to 100. 3.) Changed the annotated corpus to a new one with 9576 words in it. 4.) Started ner_train.py I get massive memory overflow during this row in ner_train.py:
reader = DataUtil(WORDVEC_FILEPATH, TAGGED_NEWS_FILEPATH)
In data_util.py this row generates a lot of memory (after 4000 rows in the raw_data it fills 120 GB of RAM (with swap)):self.wordvecs = np.vstack((self.wordvecs, new_wv))
Also getting this error when limiting the wordvec's number of rows to 100000 or 1000. Can you help me?