StevenCHowell / type_token_ratio

Application for calculating the Type-Token Ratio from a speech sample
GNU General Public License v3.0
4 stars 2 forks source link

Consider using `str.maketrans()` to replace punctuation. #3

Open StevenCHowell opened 7 years ago

StevenCHowell commented 7 years ago

This example demonstrates a slightly more consolidated method for removing punctuation.

transtable = str.maketrans('', '', string.punctuation)
for i in df.index:
    description = df[col_of_interest].iloc[i].lower()
    description = description.translate(transtable).split()
StevenCHowell commented 7 years ago

To replace punctuation with a space use the following: transtable = str.maketrans(string.punctuation, ' '*len(string.punctuation))

StevenCHowell commented 7 years ago
import string

def main():
    transtable = str.maketrans(string.punctuation, ' '*len(string.punctuation))
    with open('alice30.txt') as data:
        text = data.read().replace("'", "").translate(transtable).lower()
        wordList = text.split()

    count = {}

    for w in wordList:
        count[w] = count.get(w, 0) + 1

    keyList = sorted(count.keys())
    for k in keyList:
        print("%-20s occurred %4d times"%(k, count[k]))

main()