christianbuck / nlu

7 stars 1 forks source link

Preprocessing: speed up add_nombank? #32

Open nschneid opened 11 years ago

nschneid commented 11 years ago

I think we are loading all of NomBank for each sentence, which is the slowest part of the preprocessing pipeline. Is there an easy way to avoid this, e.g.

  1. splitting up the NomBank lines by document into .nom files, analagously to the .prop files, or
  2. using NLTK's NomBank interface (I doubt this is much faster), or
  3. adding NomBank annotations for all sentences at once (rather than looping through the sentence files in a shell script)?