CODAIT / text-extensions-for-pandas

Natural language processing support for Pandas dataframes.
Apache License 2.0
215 stars 34 forks source link

added sentiment analysis use case #189

Closed Monireh2 closed 3 years ago

Monireh2 commented 3 years ago

New Notebook for Sentiment Analysis use case

Pull request to add new notebook for sentiment analysis use case using Watson NLU, Text Extension for Pandas, Pandas and Scikit learn.

Change summary:

Signed-off-by: Monireh Ebrahimi monireh@Monirehs-MacBook-Pro.local

review-notebook-app[bot] commented 3 years ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Monireh2 commented 3 years ago

@frreiss @BryanCutler Please review!

frreiss commented 3 years ago

Added @BryanCutler and myself as reviewers on this PR.

review-notebook-app[bot] commented 3 years ago

View / edit / reply to this conversation on ReviewNB

frreiss commented on 2021-04-20T17:01:41Z ----------------------------------------------------------------

I don't think we can redistribute this data set. I recommend that you add instructions for the user to download the file archive.zip from Kaggle and place that file in notebooks/outputs.

Then the code here can should read all the records directly out of the zip file. You can use the zipfile package to read from a zip archive without unpacking the archive; see https://docs.python.org/3/library/zipfile.html


review-notebook-app[bot] commented 3 years ago

View / edit / reply to this conversation on ReviewNB

frreiss commented on 2021-04-20T17:01:42Z ----------------------------------------------------------------

Pandas has built-in function for this kind of sampling operation; see https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.sample.html


Monireh2 commented on 2021-04-20T23:16:22Z ----------------------------------------------------------------

The reason I have not used the pandas built-in DataFrameGroupBy.sample is it gives me error for those groups with size less than specified n; to use min(n,len(group)) I had to use this