QuantEcon / lecture-python-intro

An Undergraduate Lecture Series for the Foundations of Computational Economics
https://intro.quantecon.org/
40 stars 21 forks source link

[inequality] Incorporate a new exercise on vectorizing the gini_coefficient function #410

Closed mmcky closed 3 months ago

mmcky commented 7 months ago

We can write a new exercise in the inequality lecture to teach the difference in python loops and vectorization.

Here is a starting point for the exercise.

```{exercise}
:label: inequality_ex3

The {ref}`code to compute the Gini coefficient is listed in the lecture above <code:gini-coefficient>`.

This code uses loops to calculate the coefficient based on income or wealth data.

This function can be re-written using vectorization which will greatly improve the computational efficiency when using `python`.

Re-write the function `gini_coefficient` using `numpy` and vectorized code.

You can compare the output of this new function with the one above, and note the speed differences. 
:class: dropdown

Let's take a look at some raw data for the US that is stored in df_income_wealth

df_income_wealth.describe()
df_income_wealth.head(n=4)

We will focus on wealth variable n_wealth to compute a Gini coefficient for the year 1990.

data = df_income_wealth[df_income_wealth.year == 2016]
data.head(n=2)

We can first compute the Gini coefficient using the function defined in the lecture above.

gini_coefficient(data.n_wealth.values)

Now we can write a vectorized version using numpy

def gini(y):
    n = len(y)
    y_1 = np.reshape(y, (n, 1))
    y_2 = np.reshape(y, (1, n))
    g_sum = np.sum(np.abs(y_1 - y_2))
    return g_sum / (2 * n * np.sum(y))
gini(data.n_wealth.values)


however this uses a long run time series so it would be better to migrate this to use simulation data that we can control the size and generate in the lecture. 
longye-tian commented 4 months ago

Hi Matt @mmcky ,

Maybe we can add the following paragraph to illustrate this vectorized function using simulated data?

Let's simulate five populations by drawing from a lognormal distribution as before

```{code-cell} ipython3
k = 5
σ_vals = np.linspace(0.2, 4, k)
n = 2_000
σ_vals = σ_vals.reshape((k,1))
μ_vals = -σ_vals**2/2
y_vals = np.exp(μ_vals + σ_vals*np.random.randn(n))

We can compute the Gini coefficient for these five populations using the vectorized function as follows,

gini_coefficients =[]
for i in range(k):
     gini_coefficients.append(gini(simulated_data[i]))

This gives us the Gini coefficients for these five households.

gini_coefficients


Best,
Longye
mmcky commented 4 months ago

thanks @longye-tian -- if you can prepare a PR that sounds great. We can work on this together in that branch.

We can add this as an exercise to this lecture.

longye-tian commented 3 months ago

Hi Matt, I think this issue is closed by pull request #498.

Best, Longye