GMMTools

Summary

Preliminary/In progress.

A toolbox for generalized method of moments (GMM) with functionality aimed at streamlining estimating models that have long runtime, using parallel computing.

This package takes care of several `menial' tasks: saving and loading estimation results from files, seamlessly continuing long-running jobs that failed mid-estimation, creating publication-ready tables using RegressionTables.jl. The package gives special attention to the optimizer "backend" used to minimize the GMM objective function, and several options are available.

Key Features

Core estimation features:

one- and two-step GMM estimation
nonlinear optimizer: support for Optim.jl and LsqFit.jl backends, multiple initial conditions, box constraints, automatic differentiation (AD)
inference: (1) asymptotic i.i.d. and (2) Bayesian (weighted) bootstrap

Convenience features:

integrated with RegressionTables.jl for publication-quality tables
efficiently resume estimation based on incomplete results (e.g. when bootstrap run #63, or initial condition #35, fails or runs out of time or out of memory after many hours)
parallel over initial conditions (embarrassingly parallel using Distributed.jl)
parallel bootstrap
suitable for running on computer clusters (e.g. using slurm)
simple option to normalize parameters before they are fed to the optimizer
simple example to estimate a subset of parameters and use fixed values for the others (works with AD)

Example

See a fully worked out example in

examples/example_ols.jl

The following tutorial covers how to install Julia and this package and explains the above example script line by line: docs/src/tutorials/gmm.md. It aims to be accessible to first-time Julia users who are familiar with the econometrics of GMM.

Basic usage

The user must provide a moment function mom_fn(data, theta) that takes a data object (it can be a DataFrame or any other object) and a parameter vector theta. The function must returns an $N\times M$ matrix, where $N$ is the number of observations and $M$ is the number of moments.

To estimate a model using two-step optimal GMM, compute the asymptotic variance-covariance matrix, and display a table with the results, run

myfit = GMMTools.fit(MYDATA, mom_fn, theta0) # defaults: identify weights matrix, one-step GMM
GMMTools.vcov_simple(MYDATA, mom_fn, myfit)
GMMTools.regtable(myfit)

The parameter theta0 is either a vector of size $P$, or a $K\times P$ matrix with $K$ sets of initial conditions. The convenience function GMMTools.random_initial_conditions(theta0::Vector, K::Int; theta_lower=nothing, theta_upper=nothing) generates K random initial conditions around theta0 (and between theta_lower and theta_upper, if provided).

For inference using Bayesian (weighted) bootstrap, replace the second line by

# generate bootstrap weights first and separately. In case the estimation is interrupted, running this code again generates exactly the same weights, so we can continue where we left off.
bweights_matrix = GMMTools.boot_weights(MYDATA, myfit, nboot=500, method=:simple, rng_initial_seed=1234) 
GMMTools.vcov_bboot(MYDATA, mom_fn, theta0, myfit, bweights_matrix, opts=myopts)

Options

The fit function accepts the following arguments:

function fit(
    data,               # any object that can be passed to mom_fn as the first argument
    mom_fn::Function,   # mom_fn(data, theta) returns a matrix of moments (N x M)
    theta0;             # initial conditions (vector of size P or K x P matrix for K sets of initial conditions)
    W=I,                # weight matrix (N x N) or uniform scaling identity (I)
    weights=nothing,    # Vector of size N or nothing
    mode=:onestep,      # :onestep or :twostep
    run_parallel=false, # run in parallel (pmap, embarasingly parallel) or not
    opts=GMMTools.GMMOptions() # other options
)

fit() accepts detailed options that control (1) whether and how results are saved to file, and (2) the optimization backend and options.

# estimation options
myopts = GMMTools.GMMOptions(
                path="C:/temp/",          # path where to save estimation results (path = "" by default, in which case no files are created)
                write_iter=false,         # write results for each individual run (corresponding to initial conditions)
                clean_iter=true,          # delete individual run results after estimation is finished
                overwrite=false,          # if false, read existing results (or individual runs) and do not re-estimate existing results. It is the user's responsibility to ensure that the model and theta0 have not changed since the last run
                optimizer=:lsqfit,        # optimization backend to use. LsqFit.jl uses the Levenberg-Marquardt algorithm (see discussion below)
                optim_autodiff=:forward,  # use automatic differentiation (AD) to compute gradients. Currently, only forward AD using ForwardDiff.jl is supported
                lower_bound=[0.0, -Inf],  # box constraints
                upper_bound=[Inf,  10.0],
                optim_opts=(show_trace=true,), # additional options for curve_fit() from LsqFit.jl in a NamedTuple. (For Optim.jl, this should be an Optim.options() object)
                theta_factors::Union{Vector{Float64}, Nothing} = nothing, # options are nothing or a vector of length P with factors for each parameter. Parameter theta[i] will be replaced by theta[i] * theta_factors[i] before optimization. Rule of thumb: if theta[i] is on the order of 10^M, pick theta_factor[i] = 10^(-M).
                trace=1)                  # 0, 1, or 2

myfit = GMMTools.fit(MYDATA, mom_fn, theta0, mode=:twostep, opts=myopts)

Writing and reading results

GMMTools.fit(...) saves the estimation results in two files: fit.csv contains a table with one row per set of initial conditions, and fit.json contains several estimation parameters and results. GMMTools.write(myfit::GMMFit, opts::GMMOptions; subpath="fit") is the lower level function to write GMMFit objects to files.

GMMTools.vcov_simple(...) saves a vcov.json file that includes, among other objects, the variance-covariance matrix myfit.vcov.V. GMMTools.vcov_bboot(...) also saves two files vcov_boot_fits_df.csv (all individual runs for all bootstrap runs) and vcov_boot_weights.csv (rows are bootstrap runs, columns are data observations). The lower level function is GMMTools.write(myvcov::GMMvcov, opts::GMMOptions; subpath="vcov").

GMMTools.read_fit(full_path; subpath="fit", show_trace=false) reads estimation results and loads them into a GMMFit object. GMMTools.read_vcov(full_path; subpath="vcov", show_trace=false) reads vcov results and loads them into a GMMvcov object. Note that GMMTools.read_fit() attempts to also read the vcov from the same folder. Otherwise, read the vcov separately and attach it using

myfit = GMMTools.read_fit(mypath1)
myfit.vcov = GMMTools.read_vcov(mypath2)

Parallel

Embarassingly parallel using Distributed.jl.

For an example, see examples/example_ols_parallel.jl.

Note, the MYDATA object is copied to all workers. Currently, no support exists for SharedArrays.jl or other options that are less memory intensive.

Capturing errors

By default, any error during optimization stops the entire estimation (or inference) command.

Set the GMMOptions field throw_errors=false to capture these errors and write them to file, but not interrupt the rest of the estimation procedure.

when using multiple initial conditions, all iterations that error are recorded in myfit.fits_df with errored=true and obj_value=Inf. If all iterations error, we have myfit.errored=true and several other fields are recorded as missing
for bootstrap results, similar rules apply. Note that inference objects (SEs, vcov, etc.) are computed dropping the bootstrap runs that errored. @warn messages should be displayed to remind the user that this is happenening. It is the user's responsibility to ensure this behavior is ok for their use case.

Misc convenience options and examples

Scaling parameters. The theta_factors option in GMMOptions() requests that the parameters be scaled before feeding them to the optimizer. Parameter theta[i] will be replaced by theta[i] * theta_factors[i] before optimization. Rule of thumb: if theta[i] is on the order of 10^M, pick theta_factor[i] = 10^(-M).

Example. Suppose theta = [alpha, beta] and we expect alpha to be between 0 and 1 while beta to be on the order of 0 to 1000. In general (not always) it's a good idea to scale beta down to approximately 0 to 1, which will ensure that the optimizer "cares" similarly about alpha and beta. (This also helps to make sense of magnitude of the default optimizer tolerance for theta, typically called x_tol.) We achieve this simply by selecting theta_factors=[1.0, 0.001]. All other inputs and outputs should have the original magnitudes: initial conditions theta0 and final estimates myfit.theta_hat.

Estimating a subset of parameters. See examples/example_ols_fixed.jl

Package To-do list

Dev to-do list

compute sensitivity measure (Andrews et al 2017)
classical minimum distance (CMD), CUE
more general estimation of the covariance of the moments, cluster, HAC, Newey-West, etc.
other optimization backends (parallel tempering)
tests
(?) (using user-provided function to generate data from model) Monte Carlo simulation to compute size and power.
(?) (using user-provided function to generate data from model) Monte Carlo simulation of estimation finite sample properties (simulate data for random parameter values ⇒ run GMM ⇒ compare estimated parameters with underlying true parameters)
(?) estimate a subset of parameters with option theta_fixed = [missing, 1.0, 9.5, missing] to fix theta_2=1.0 and theta_3=9.5 and estimate (theta_1, theta_4)

Documentation to-do list

docs
Metrics theory recap
Bootstrap options, including custom weihts function and an example
walkthrough for re-running failed estimation, stability of the random initial condition and bootstrap weights
Optimization discussion: finite vs AD, discontinuities in the objective function (e.g. due to iterated value function), (anecdotal) advantages of using the Levenberg-Marquardt algorithm over Optiml.jl
Non-linear estimation example
Example with AD including cache data and implicit function differentiation

For related projects, see

Acknowledgements

Contributor: Peter Deffebach. For useful suggestions and conversations: Michael Creel, Jeff Gortmaker.

Gkreindler / GMMTools.jl

readme