Analyse Emailing Sending and Receiving Behavior and Sentiment towards…

priyankaiitg commented 1 year ago

… Male and Female Genders.

codecov-commenter commented 1 year ago

Codecov Report

Patch and project coverage have no change.

Comparison is base (7dc30d2) 73.20% compared to head (e87b79a) 73.20%.

:exclamation: Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #594 +/- ## ======================================= Coverage 73.20% 73.20% ======================================= Files 31 31 Lines 3702 3702 ======================================= Hits 2710 2710 Misses 992 992 ``` | Flag | Coverage Δ | | |---|---|---| | unittests | `73.20% <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=datactive#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

sbenthall commented 1 year ago

This is great progress!

A few comments:

A couple of comments:

Some qualitative text framing the research problem being addressed, and the interpretation of the results, would be helpful. Maybe break up the big blocks of print statements into sections, and explain them? As is, it's not clear how the counts connect to the research questions that we've discussed.
Similarly, printed numbers are note as nice as plots, and Jupyter notebooks make plotting very easy! See other notebooks in the examples/ directory how how we've done this elsewhere.
Once the code for the analysis of a single mailing list has been worked out, it would be good to encapsulate it in a function. That way it can be applied to many mailing lists to compare them.
I believe for political correctness reasons, it is best to change the terms "male/female" to "men/women". I think in one of the other notebooks we did this change programmatically. It would also be good to include some disclaimer text like the following:

"BigBang uses a library that guesses the gender of a person based on their first name and census records. We understand that this method is prone to error. Only names with very high correlation with a particular gender are so identified. Because of these and other errors, we consider gender in statistical aggregates only. Please do not take these results as attributing gender to any particular individual on the mailing list."

sbenthall commented 1 year ago

Huge progress! In general, I love the plots. This will be by far one of our best notebooks.

One technical issue:

In cell [9], I'm getting this error: https://gist.github.com/sbenthall/15af4d7edb5774303d71f56b42dfcd04 Looks connected to this: https://stackoverflow.com/questions/76158147/pandas-groupby-valueerror-cannot-subset-columns-with-a-tuple-with-more-than-o Which suggests that you may have been using an older version of Pandas. Can you update Pandas and figure out how to correct this?

Two nitpicks on presentation -- not necessary to fix...

In cell [10] (the first bar plot), I am a little confused by the plot since only one column has the darker blue bar. Is that because the number of unique senders is negligible in those categories? I assume this is a stacked bar plot, but it is hard to tell.

Could you add text explaining what you mean by "Response or interaction ratio"?

sbenthall commented 1 year ago

Great work. Thanks @priyankaiitg !

datactive / bigbang

Analyse Emailing Sending and Receiving Behavior and Sentiment towards… #594

Codecov Report