delip / PyTorchNLPBook

Code and data accompanying Natural Language Processing with PyTorch published by O'Reilly Media https://amzn.to/3JUgR2L
Apache License 2.0
1.98k stars 807 forks source link

Function preprocess_text does not seem to strip punctuations #15

Open govindgnair23 opened 5 years ago

govindgnair23 commented 5 years ago
def preprocess_text(text):
    text = ' '.join(word.lower() for word in text.split(" "))
    text = re.sub(r"([.,!?])", r" \1 ", text)
    text = re.sub(r"[^a-zA-Z.,!?]+", r" ", text)
    return text

Calling preprocess_text('Are you a, boy or a girl?') returns:

''are you a , boy or a girl ? "