NCSU-Libraries / data-science-cookbook

MIT License
6 stars 1 forks source link

Natural Laguage Process #6

Open rhuang44 opened 3 years ago

rhuang44 commented 3 years ago

Provide guidance to process text data with Python, including text cleaning, feature engineering, label classification and etc.

rhuang44 commented 3 years ago
  1. Text Data Cleaning

    • Remove punctuation
    • Remove stopwords
    • Tokenization
    • Lemmatize/Stem
  2. Feature Engineering

    • bag-of-words
    • TF-IDF
    • Word2Vec
    • BERT
  3. ML models (classification/regression)

mikenutt commented 3 years ago

Documenting what we discussed just now. For pointing to existing code that may be helpful for users, we can add that to https://go.ncsu.edu/resourcelist. Code contributed to this cookbook should be original (or modified open source) code. @rhuang44 You can close this comment when you've had a chance to add these ideas to go.ncsu.edu/resourcelist/