Open audrism opened 5 years ago
https://github.com/DisasterMasters/TweetAnalysis/blob/master/src/results/Relevance%20Preprocessing.ipynb Best Text Preprocessing for Doc2vec is simply distributed bag of words + punctuation removal Tried combos of distributed memory distributed bag of words LowerCase Removal of Stop Words Rare words removal Spelling correction punctuation removal
@abhidya what are the datasets you train relevant/irrelevant tweets for irma? Also is the code link above the right one. @nwest13
https://github.com/DisasterMasters/TweetAnalysis/blob/master/src/results/Relevance%20Preprocessing.ipynb Best Text Preprocessing for Doc2vec is simply distributed bag of words + punctuation removal Tried combos of distributed memory distributed bag of words LowerCase Removal of Stop Words Rare words removal Spelling correction punctuation removal