This PR proposes to add padding to any sequence with which a function calls the more_itertools.windowed function. This ensures that the first and last window_size words or tokens are included in any searches that use the windowed function. The padding makes use of the itertools.chain method as suggested in the more_itertools documentation.
At this stage, the only use of the windowed function is in the Document.get_word_windows method. This PR includes padding there, and it also adds two self.assertEqual checks in the DocumentTestCase.test_get_word_windows method to test the padding.
The need for padding arose when debugging a TestCase that will appear in PR #38. As such, we plan to incorporate padding in the generate_token_counter function in proximity.py in that PR.
This PR proposes to add padding to any sequence with which a function calls the
more_itertools.windowed
function. This ensures that the first and lastwindow_size
words or tokens are included in any searches that use thewindowed
function. The padding makes use of theitertools.chain
method as suggested in themore_itertools
documentation.At this stage, the only use of the
windowed
function is in theDocument.get_word_windows
method. This PR includes padding there, and it also adds twoself.assertEqual
checks in theDocumentTestCase.test_get_word_windows
method to test the padding.The need for padding arose when debugging a
TestCase
that will appear in PR #38. As such, we plan to incorporate padding in thegenerate_token_counter
function inproximity.py
in that PR.