fbkarsdorp / python-course

Tutorial and introduction into programming with Python for the humanities and social sciences
http://www.karsdorp.io/python-course/
423 stars 297 forks source link

remove_punc concatenates words #15

Open xjlc opened 9 years ago

xjlc commented 9 years ago

remove_punc and remove_punc2 concatenate some words. For example, "Woodhouse.--Dear" gets replaced by WoodhouseDear. This leads to arguably questionable results of the later tests. For example, the count of Woodhouse by an implementation of remove_punc that replaces punctuation by " " and later replaces " " by " " is 314. Similarly, the frequency count of "the" is 5204 rather than 5146. You are probably aware of this, but a cautionary note in the documentation would be warranted in my opinion.

fbkarsdorp commented 9 years ago

Hi! Thanks for your comments. I'll have a look at this.