NoahCarnahan / plagcomps

Other
6 stars 1 forks source link

don't initialize pos tags! #15

Closed nrjones8 closed 10 years ago

nrjones8 commented 10 years ago

when testing features that don't use POS tags, don't initialize them since they take forever. I believe @zachwooddoughty is revamping featureextraction.py to just store a single dictionary anyway, so that should fix this issue.

zachwooddoughty commented 10 years ago

fixed this, sorry for it being stupid

This isn't quite as elegant as it could be, because right now we have a two-step process of using pos_tags in features. What we're doing is:

1) Tag the text and store (word, tag) tuples. 2) Build the dictionary of tags and sum tables

These could probably be combined, but it isn't too horrible to keep them apart. However, we could merge these into one step. Currently, 2) needs to check to see that the text has already been tagged, so I've added a self.pos_tagged Boolean to keep track of whether they've been tagged.

We could make this more clean by merging 1) and 2) and then we could just check the self.features dictionary to see whether the pos tag sum tables have been created.

Thoughts?