Closed Monireh2 closed 3 years ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
@frreiss @BryanCutler Please review!
Added @BryanCutler and myself as reviewers on this PR.
View / edit / reply to this conversation on ReviewNB
frreiss commented on 2021-04-20T17:01:41Z ----------------------------------------------------------------
I don't think we can redistribute this data set. I recommend that you add instructions for the user to download the file archive.zip from Kaggle and place that file in notebooks/outputs.
Then the code here can should read all the records directly out of the zip file. You can use the zipfile
package to read from a zip archive without unpacking the archive; see https://docs.python.org/3/library/zipfile.html
View / edit / reply to this conversation on ReviewNB
frreiss commented on 2021-04-20T17:01:42Z ----------------------------------------------------------------
Pandas has built-in function for this kind of sampling operation; see https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.sample.html
Monireh2 commented on 2021-04-20T23:16:22Z ----------------------------------------------------------------
The reason I have not used the pandas built-in DataFrameGroupBy.sample is it gives me error for those groups with size less than specified n; to use min(n,len(group)) I had to use this
View / edit / reply to this conversation on ReviewNB
frreiss commented on 2021-04-20T17:01:42Z ----------------------------------------------------------------
I don't think you need to run all of these different types of analysis. The only one you should need here is keywords
.
View / edit / reply to this conversation on ReviewNB
frreiss commented on 2021-04-20T17:01:43Z ----------------------------------------------------------------
This is a very large output. I don't think it's necessary to print all these dataframes a text. You can just show one example DataFrame, and use Jupyter to display the DataFrame as an HTML table.
Monireh2 commented on 2021-04-22T18:09:21Z ----------------------------------------------------------------
done; given we had shown the output for one review; I just removed the output here.
View / edit / reply to this conversation on ReviewNB
frreiss commented on 2021-04-20T17:01:43Z ----------------------------------------------------------------
You should add an explanation before this cell of what this cell computes and why that output is interesting.
View / edit / reply to this conversation on ReviewNB
frreiss commented on 2021-04-20T17:01:44Z ----------------------------------------------------------------
I recommend you put this part before the multivariate linear regression. You can say something like, "since the sentiment.score
field shows a relatively high correlation with the rating, let's try a regression based on just that value". Then you can segue into the multivariate linear regression by saying something like, "now let's try adding the fine-grained sentiment scores from Watson to our model and see if the coefficient of determination (r^2) goes up"
View / edit / reply to this conversation on ReviewNB
frreiss commented on 2021-04-20T17:01:44Z ----------------------------------------------------------------
There should be a conclusion here. You should be able to say something pretty positive about the results you show in this graph. This scatterplot clearly shows more correlation than the ones up above. I recommend that you add a linear trendline.
Monireh2 commented on 2021-04-23T23:42:44Z ----------------------------------------------------------------
Addressed!
Looking good! Some comments inline via ReviewNB. There are two issues that really need to be cleared up before we can merge this:
The reason I have not used the pandas built-in DataFrameGroupBy.sample is it gives me error for those groups with size less than specified n; to use min(n,len(group)) I had to use this
View entire conversation on ReviewNB
done; given we had shown the output for one review; I just removed the output here.
View entire conversation on ReviewNB
Thanks Fred, Addressed all your comments and pushed to the repository. please check!
Looks good to me.
New Notebook for Sentiment Analysis use case
Pull request to add new notebook for sentiment analysis use case using Watson NLU, Text Extension for Pandas, Pandas and Scikit learn.
Change summary:
Signed-off-by: Monireh Ebrahimi monireh@Monirehs-MacBook-Pro.local