dlab-berkeley / Python-Text-Analysis-Legacy-2023

D-Lab's 12 hour introduction to text analysis with Python. Learn how to perform bag-of-words, sentiment analysis, topic modeling, word embeddings, and more, using scikit-learn, NLTK, Gensim, and spaCy in Python.
Creative Commons Attribution 4.0 International
22 stars 9 forks source link

01_preprocessing Challenge 6 #13

Closed kazutoki-m closed 2 years ago

kazutoki-m commented 2 years ago

Challenge 6 also needs empty code function such as:


def preprocess(text): """Preprocesses a string."""

Lowercase

text = text.lower()
# Replace URLs
url_pattern = r'https?:\/\/.*[\r\n]*'
url_repl = ' URL '
text = re.sub(url_pattern, url_repl, text)
# Replace digits
digit_pattern = '\d+'
digit_repl = ' DIGIT '
text = re.sub(digit_pattern, digit_repl, text)
# Remove blank spaces
blankspace_pattern = r'\s+'
blankspace_repl = ' '
text = re.sub(blankspace_pattern, blankspace_repl, text)
text = text.strip()
# YOUR CODE HERE

return text