aatkinson / deep-named-entity-recognition

Use RNNs to identify entities in news queries
56 stars 21 forks source link

Memory overflow #7

Closed ptamas88 closed 6 years ago

ptamas88 commented 6 years ago

Hi!

1.) I changed your wordvec.txt to a new one (trained with google wordvec, and 2.5 M tokens, 100 features). 2.) Changed the hardcoded 300 values to 100. 3.) Changed the annotated corpus to a new one with 9576 words in it. 4.) Started ner_train.py I get massive memory overflow during this row in ner_train.py: reader = DataUtil(WORDVEC_FILEPATH, TAGGED_NEWS_FILEPATH) In data_util.py this row generates a lot of memory (after 4000 rows in the raw_data it fills 120 GB of RAM (with swap)): self.wordvecs = np.vstack((self.wordvecs, new_wv)) Also getting this error when limiting the wordvec's number of rows to 100000 or 1000. Can you help me?

ptamas88 commented 6 years ago

I solved this issue by changing np.vstack function to a primitive implementation: replaced this:

self.wordvecs = np.vstack((self.wordvecs, new_wv))

with this:

wv_new = []
for i in xrange(len(self.wordvecs)):
  wv_new.append(self.wordvecs[i])
wv_new.append(new_wv)
self.wordvecs = wv_new