Closed nrjones8 closed 10 years ago
fixed this, sorry for it being stupid
This isn't quite as elegant as it could be, because right now we have a two-step process of using pos_tags in features. What we're doing is:
1) Tag the text and store (word, tag) tuples. 2) Build the dictionary of tags and sum tables
These could probably be combined, but it isn't too horrible to keep them apart. However, we could merge these into one step. Currently, 2) needs to check to see that the text has already been tagged, so I've added a self.pos_tagged Boolean to keep track of whether they've been tagged.
We could make this more clean by merging 1) and 2) and then we could just check the self.features dictionary to see whether the pos tag sum tables have been created.
Thoughts?
when testing features that don't use POS tags, don't initialize them since they take forever. I believe @zachwooddoughty is revamping
featureextraction.py
to just store a single dictionary anyway, so that should fix this issue.