Andyccs / sport-news-retrieval

MIT License
6 stars 2 forks source link

Review classification #8

Closed Andyccs closed 8 years ago

Andyccs commented 8 years ago

Refer to issue #7

  1. rename classify.pytosentiment_api.py`
  2. espn_data_result.json is reproducible by using sentiment_api.py now. This will generate label_api.csv. All classifiers will use label_api.csv to train model. Required changes have been made in data_source.py
  3. Change classification to only work on espn data for now
  4. Move kappa calculation out from preprocess.py to calculate_kappa.py. This will generate two file called label_1.csv and label_2.csv
  5. Move common function to common.py
  6. Create classify_data.py to classify all data (espn and NBACentral for now) using linear SVC
  7. update Solr configuration for additional label field
Andyccs commented 8 years ago

More preprocessing steps:

  1. Remove links
  2. Remove mention
  3. Remove hastag
  4. Lemmatization
  5. Remove punctuation
Andyccs commented 8 years ago

@kklw I changed lots of things in this pull request, please take your time to review these changes.