bsacash / Introduction-to-NLP

Lectures for Udemy - INLP
71 stars 83 forks source link

Sentiment Analysis #2

Open rajarameshmamidi opened 4 years ago

rajarameshmamidi commented 4 years ago

Hi Brain,

I gone through your course 'Introduction to Natural Language' in Udemy. It was very helpful and your explanation is very interesting. i have started learning NLP and i have few doubts on sentiment analysis in one of the project which you explain in the course related to positive and negative word analysis. Is it possible to analysis positive and negative word without word_positive.csv and word_negative.csv?

your help is much appreciated. Thanks, raja

bsacash commented 4 years ago

Absolutely! The version I presented is VERY basic and has many edge cases. While it works in general cases, it serves as good culmination of basic NLP tasks. It is more an example of what one can do with NLTK rather than building a robust sentiment analysis tool.

You are referring to machine learning. The approach in the course was rule based. A machine learning approach takes into account many more dimensions and requires training. At a basic level, a naive Bayes approach could work. At an advanced level, using a language model and fine tuning on your own data would make a state-of-the-art sentiment classifier that would have near human performance.

Sent from my phone

On Jan 9, 2020, at 00:59, rajarameshmamidi notifications@github.com wrote:

 Hi Brain,

I gone through your course 'Introduction to Natural Language' in Udemy. It was very helpful and your explanation is very interesting. i have started learning NLP and i have few doubts on sentiment analysis in one of the project which you explain in the course related to positive and negative word analysis. Is it possible to analysis positive and negative word without word_positive.csv and word_negative.csv?

your help is much appreciated. Thanks, raja

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

rajarameshmamidi commented 4 years ago

Thank you... Brain.

i have build below code with the help of udemy course , googling and understood till bag-of-words using sklearn countvectorizer. Now, i confused how to split the data into train and test sets. The file which i am trying doesn't have any label information and i am interested to perform sentiment analysis on this data. could you please let me know how to split the data into train set and test set to perform sentiment analysis with various ML Algorithms.

in below code i have included CountVectorizer information as i am passing the data after performing pre-processing steps.

from sklearn.feature_extraction.text import CountVectorizer from sklearn.model_selection import train_test_split cv = CountVectorizer(stop_words = 'english') dtm = cv.fit_transform(data)

here is the file which i am trying to perform sentiment analysis. modi_speech.txt