dhmit / gender_analysis_web

4 stars 0 forks source link

Add padding to word windows #49

Closed joshfeli closed 3 years ago

joshfeli commented 3 years ago

This PR proposes to add padding to any sequence with which a function calls the more_itertools.windowed function. This ensures that the first and last window_size words or tokens are included in any searches that use the windowed function. The padding makes use of the itertools.chain method as suggested in the more_itertools documentation.

At this stage, the only use of the windowed function is in the Document.get_word_windows method. This PR includes padding there, and it also adds two self.assertEqual checks in the DocumentTestCase.test_get_word_windows method to test the padding.

The need for padding arose when debugging a TestCase that will appear in PR #38. As such, we plan to incorporate padding in the generate_token_counter function in proximity.py in that PR.