Authors : Young Jun Lee and Daniel Wilhelm
This project provides the STATA command dgmtest
which implements the test for significance by Delgado and Manteiga (2001) and can be used to test for the presence of measurement error as described in Wilhelm (2018) and Lee and Wilhelm (2018).
Files contained in this package:
dgmtest.ado
contains the dgmtest
command.dgmtest.sthlp
contains the Stata helpfile for the dgmtest
command.example_DGM2001.ado
and simul_DGM2001.do
contain the code to replicate the simulations in Delgado and Manteiga (2001).example_Wilhelm2018.ado
and simul_Wilhelm2018.do
contain the code to replicate the simulations in Wilhelm (2018).example.do
contains the simple simulation example shown below.dgmtest
as described below.The command dgmtest
tests the null hypothesis
H0: E[Y | X, W, Z] = E[Y | X, W]
against the alternative that the null does not hold, where
The vector of explanatory variables, W, may contain elements that enter the conditional expectation in a linear, additively separable fashion. For example, decompose W=(W1,W2) where W1 enters nonseparably and W2 enters in a linear, additively separable fashion,
E[Y | X, W, Z] = f(X,W1,Z) + pi*W2
where f is some function and pi a row-vector of the same dimension as W2. In the presence of variables W2, we apply the test in Delgado and Manteiga (2001) after replacing Y with (Y - pihat*W2), where pihat is Robinson (1988)'s estimator of pi.
Syntax:
dgmtest depvar expvar [if] [in] [, qz(integer) qw2(integer) teststat(string) kernel(string) bootdist(string) bw(real) bootnum(integer) ngrid(integer) qgrid(real)]
where
depvar
is the outcome variable Yexpvar
is a list of variables containing all elements of X, W, and Z. The order of variables in the list should be: X, W, Z)The options are as follows:
qz
is the dimension of Z (default = 1).qw2
is the dimension of W2 (default = 0).teststat
is the type of test statistic to be used: Cramer-van Mises (CvM, default) or Kolmogorov-Smirnov (KS).kernel
is the kernel function: biweight, epanechnikov (default), epan2, epan4, normal, rectangle, triangular.bw
is the bandwidth (default = n^(-1/3q), rule of thumb, where n is the sample size and q the dimension of X1).bootnum
is the number of bootstrap samples for the computation of the test's critical value (default = 500).bootdist
is the distribution of the bootstrap multiplier variable: mammen (default), rademacher, uniform.ngrid
is the number of equally spaced grid points used to compute the supremum of the KS statistic, if that statistic is chosen via the option teststat
. The default is 0 which means that the sample serves as the grid.qgrid
is a number between 0 and 1 to define the min and max values of the grid in the previous option. The min value is the qgrid
-quantile and the max value is the (1-qgrid
)-quantile. The default is 0 so that in that case the grid ranges from the min to the max value in the sample.If options are left unspecified, the command runs on the default settings.
Wilhelm (2018) shows that, under some conditions, the null hypothesis H0 is equivalent to the hypothesis of no measurement error in X. In this context, the variable Z must be excluded from the outcome equation. For example, it could be a second measurement or an instrumental variable. See Wilhelm (2018), Lee and Wilhelm (2018), and the examples below for more details.
set obs 200
// true regressor
generate Xstar = runiform()
// measurement error in X
generate etaX = runiform()
// mismeasured regressor
generate X1 = Xstar + 0.5*etaX
// additively linear control variable
generate X2 = runiform()
// measurement error in Z
generate etaZ = runiform()
// second measurement of true regressor
generate Z = Xstar + 0.5*etaZ
// regression error
generate epsilon = runiform()
We generate an outcome in two different ways, in a regression with and without additively separable, linear controls:
// outcome equation without controls
generate Y1 = Xstar^2 + 0.2*Xstar + 0.5*epsilon
// outcome equation with controls
generate Y2 = Xstar^2 + 0.2*Xstar + 0.5*X2 + 0.5*epsilon
Perform the test using default options:
// perform the test of the hypothesis of no measurement error in X1
dgmtest Y1 X1 Z
dgmtest Y2 X1 X2 Z, qw2(1)
Perform the test, choosing the triangular kernel function:
// perform the test of the hypothesis of no measurement error in X1
dgmtest Y1 X1 Z, kernel(triangular)
dgmtest Y2 X1 X2 Z, qw2(1) kernel(triangular)
Wilhelm, D. (2018), "Testing for the Presence of Measurement Error", CeMMAP Working Paper CWP45/18