remove_punc and remove_punc2 concatenate some words. For example, "Woodhouse.--Dear" gets replaced by WoodhouseDear. This leads to arguably questionable results of the later tests. For example, the count of Woodhouse by an implementation of remove_punc that replaces punctuation by " " and later replaces " " by " " is 314. Similarly, the frequency count of "the" is 5204 rather than 5146.
You are probably aware of this, but a cautionary note in the documentation would be warranted in my opinion.
remove_punc and remove_punc2 concatenate some words. For example, "Woodhouse.--Dear" gets replaced by WoodhouseDear. This leads to arguably questionable results of the later tests. For example, the count of Woodhouse by an implementation of remove_punc that replaces punctuation by " " and later replaces " " by " " is 314. Similarly, the frequency count of "the" is 5204 rather than 5146. You are probably aware of this, but a cautionary note in the documentation would be warranted in my opinion.