Open rickecon opened 6 years ago
BQkde_scaled = BQkde / BQkde.sum()
should be?
BQkde_scaled = BQkde / np.sum(BQkde)
Hmm, my KDE surface plot looked weird, didn't seem to match the part A, so I think:
for abil, num_j in zip(abils_vec, income_probs):
should be
for abil, num_j in zip(abils_midpt, income_probs):
Courtesy of @Sun-Kev
When plotting my KDE surface plot, it appears to be flipped along the income group dimension relative to my plot in part A – did anyone else have this issue?
Exercise 1 of Problem Set 2 asks you to use some data on U.S. bequests (inheritances), plot the raw data, and fit a kernel density estimator. The following helps respond to @jgdenby's question in Issue #21.
Part (a). The first part of this requires you to plot a 2D histogram of the original data,
BQmat_orig
. If you use the code suggested in the text of the exercise, you will get aNumPy
matrix of dimension78 x 7
. This matrix represents three dimensions of data. The rows represent ages, the columns represent lifetime income groups, and each element represents the bequests percentage for each age-income pair. Use theplot_surface()
command to make a 2D histogram (3D plot) of this original data, as shown in Section 3.1 of the KDE.ipynb notebook.The input arguments to the
plot_surface(x_mat, y_mat, data3D)
function are the following. Letdata3D
be anm x n
matrix of data to be plotted in which them
rows represent them
values of thex
variable, and then
columns represent then
values of they
variable. Let them
values of thex
variable be listed in a vectorx_vec
of length(m,)
, and let then
values of they
variable be listed in a vectory_vec
of length(n,)
. Then the inputx_mat
to theplot_surface()
function is the column vectorx_vec.reshape((m, 1))
copied acrossn
columns such that the resulting matrixx_mat
has shape(m, n)
. The inputy_mat
to theplot_surface()
function is the row vectory_vec.reshape((1, n))
copied downm
rows such that the resulting matrixy_mat
has shape(m, n)
. This operation to createx_mat
andy_mat
fromx_vec
andy_vec
can be easily done using thenp.meshgrid()
function.Part (b). The
gaussian_kde()
function used in Section 3.1 of the KDE.ipynb notebook takes two arguments and returns a KDE object.The
bw_method
argument is simply the bandwidth of the KDE estimator. This is referred to aslambda
in exercise 1. Thedata
input to the function should be2 x N
for bivariate data, whereN
is the number of data points. Thedata
input is two stacked vectors ofx
values andy
values. For example, in exercise 1, the age vector isand the midpoints of the lifetime ability group percentiles described in exercise 1 are
The
gaussian_kde()
function needs data on the age and income variables that reflect the percentages from the raw databq_data
from part(a). The matrixbq_data
has the percent of bequests received by each age-income pair. You can generate data forN=70000
observations from this distribution in the following way.You can then estimate the value of the KDE smoothed bequests distribution by inputting data for the exact points of the
ages_vec
andabils_vec
, expanded to matrices likex_mat
andy_mat
above, and then flattened out into pairwise data.Now you can just print your result using
plot_surface()
for which the inputs are matrices, and you can experiment with different bandwidths to see which resulting KDE estimator looks most like the original data with the noise smoothed out.