abjer / isds2020

Introduction to Social Data Science 2020 - a summer school course abjer.github.io/isds2020
58 stars 92 forks source link

Ex. 3.2.1 #27

Open theaiuel opened 4 years ago

theaiuel commented 4 years ago

When we make a box plot on the probability of survival for men and women within each passenger class it does not turn out nice. When we try there is no ‘second class’ and the plot is not informative. Our code is:

sns.boxplot(x='class', y='survived', hue='sex', data=titanic, ax=ax[1])

jsr-p commented 4 years ago

hi @theaiuel , the reason that your plot turns out weird is that the survived variable is a dummy variable. As you can see in the screenshot below the descriptive statistics for the survived variable conditioned on the variables class and sex are not that useful. The boxplot basically visualizes these measures and as such the plot is not that informative.

The barplot is more useful in this case.

"A bar plot represents an estimate of central tendency for a numeric variable with the height of each rectangle and provides some indication of the uncertainty around that estimate using error bars." https://seaborn.pydata.org/generated/seaborn.barplot.html

As the mean is quite informative for a dummy variable this is the plot type to use here :)

image