Monireh2 commented 3 years ago

New Notebook for Sentiment Analysis use case

Pull request to add new notebook for sentiment analysis use case using Watson NLU, Text Extension for Pandas, Pandas and Scikit learn.

Change summary:

Added new ipython notebook on the use case
Added new Car Reviews dataset from Kaggle

Signed-off-by: Monireh Ebrahimi monireh@Monirehs-MacBook-Pro.local

review-notebook-app[bot] commented 3 years ago

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Monireh2 commented 3 years ago

@frreiss @BryanCutler Please review!

frreiss commented 3 years ago

Added @BryanCutler and myself as reviewers on this PR.

review-notebook-app[bot] commented 3 years ago

View / edit / reply to this conversation on ReviewNB

frreiss commented on 2021-04-20T17:01:41Z ----------------------------------------------------------------

I don't think we can redistribute this data set. I recommend that you add instructions for the user to download the file archive.zip from Kaggle and place that file in notebooks/outputs.

Then the code here can should read all the records directly out of the zip file. You can use the zipfile package to read from a zip archive without unpacking the archive; see https://docs.python.org/3/library/zipfile.html

review-notebook-app[bot] commented 3 years ago

View / edit / reply to this conversation on ReviewNB

frreiss commented on 2021-04-20T17:01:42Z ----------------------------------------------------------------

Pandas has built-in function for this kind of sampling operation; see https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.sample.html

Monireh2 commented on 2021-04-20T23:16:22Z ----------------------------------------------------------------

The reason I have not used the pandas built-in DataFrameGroupBy.sample is it gives me error for those groups with size less than specified n; to use min(n,len(group)) I had to use this

review-notebook-app[bot] commented 3 years ago

View / edit / reply to this conversation on ReviewNB

frreiss commented on 2021-04-20T17:01:42Z ----------------------------------------------------------------

I don't think you need to run all of these different types of analysis. The only one you should need here is keywords.

review-notebook-app[bot] commented 3 years ago

View / edit / reply to this conversation on ReviewNB

frreiss commented on 2021-04-20T17:01:43Z ----------------------------------------------------------------

This is a very large output. I don't think it's necessary to print all these dataframes a text. You can just show one example DataFrame, and use Jupyter to display the DataFrame as an HTML table.

Monireh2 commented on 2021-04-22T18:09:21Z ----------------------------------------------------------------

done; given we had shown the output for one review; I just removed the output here.

review-notebook-app[bot] commented 3 years ago

View / edit / reply to this conversation on ReviewNB

frreiss commented on 2021-04-20T17:01:43Z ----------------------------------------------------------------

You should add an explanation before this cell of what this cell computes and why that output is interesting.

review-notebook-app[bot] commented 3 years ago

View / edit / reply to this conversation on ReviewNB

frreiss commented on 2021-04-20T17:01:44Z ----------------------------------------------------------------

I recommend you put this part before the multivariate linear regression. You can say something like, "since the sentiment.score field shows a relatively high correlation with the rating, let's try a regression based on just that value". Then you can segue into the multivariate linear regression by saying something like, "now let's try adding the fine-grained sentiment scores from Watson to our model and see if the coefficient of determination (r^2) goes up"

review-notebook-app[bot] commented 3 years ago

View / edit / reply to this conversation on ReviewNB

frreiss commented on 2021-04-20T17:01:44Z ----------------------------------------------------------------

There should be a conclusion here. You should be able to say something pretty positive about the results you show in this graph. This scatterplot clearly shows more correlation than the ones up above. I recommend that you add a linear trendline.

Monireh2 commented on 2021-04-23T23:42:44Z ----------------------------------------------------------------

Addressed!

frreiss commented 3 years ago

Looking good! Some comments inline via ReviewNB. There are two issues that really need to be cleared up before we can merge this:

Fix the dataset issues (see https://github.com/CODAIT/text-extensions-for-pandas/pull/189#issuecomment-823447848). We can't have data with uncertain IP rights in our repository.
Remove the really large output (see https://github.com/CODAIT/text-extensions-for-pandas/pull/189#issuecomment-823447866). It makes the notebook viewer freeze for a long time.

Monireh2 commented 3 years ago

The reason I have not used the pandas built-in DataFrameGroupBy.sample is it gives me error for those groups with size less than specified n; to use min(n,len(group)) I had to use this

View entire conversation on ReviewNB

Monireh2 commented 3 years ago

done; given we had shown the output for one review; I just removed the output here.

View entire conversation on ReviewNB

Monireh2 commented 3 years ago

Addressed!

View entire conversation on ReviewNB

Monireh2 commented 3 years ago

Thanks Fred, Addressed all your comments and pushed to the repository. please check!

frreiss commented 3 years ago

Looks good to me.

CODAIT / text-extensions-for-pandas

added sentiment analysis use case #189

New Notebook for Sentiment Analysis use case

Change summary: