ayota / ddl_nlp

Repo for DDL research lab project.
2 stars 1 forks source link

Performance refactor: streaming sentence from files #60

Closed lauralorenz closed 7 years ago

lauralorenz commented 7 years ago

There are cases where we read the entire corpus into memory which is just going to give us more and more pain as the corpus gets larger.

  1. During corpus cleaning, all files are read into a single string. May be fixed during the spacey refactor (issue #57)
  2. During generating corpus folds, each sentence in the corpus is read into a single list. This may be fixed during dropping generating folds (issue #61)
lauralorenz commented 7 years ago

Point 2 was fixed in PR #64

lauralorenz commented 7 years ago

Closing this because it should be represented now all in issue #57