FixedEffects / FixedEffectModels.jl

Fast Estimation of Linear Models with IV and High Dimensional Categorical Variables
Other
225 stars 46 forks source link

Wrong results in a large data set with one set of FE #249

Closed droodman closed 9 months ago

droodman commented 10 months ago

I'm running reg() on data set with 9 million rows and nearly 100 non-absorbed regressors, most of which are dummies for certain birth years. I am absorbing only 1 set of FE. Occasionally some of the estimates are clearly wrong. The coefficients on the birth year dummies should vary rather smoothly with birth year, but they sometimes make big jumps. Running the same regression with reghdfe or areg in Stata does not have this problem.

I will paste the output of an example. I cannot share the data publicly but will email it to @matthieugomez

julia> using CSV, DataFrames, Plots, FixedEffectModels

julia> df = CSV.read("c:/users/drood/Downloads/FEbug.csv", DataFrame)
8830997×96 DataFrame
     Row │ part     _Ibirthyr_1907  _Ibirthyr_1908  _Ibirthyr_1909  _Ibirthyr_1910  _Ibirthyr_1911  _Ibirthyr_1912  _I ⋯
         │ Float64  Float64         Float64         Float64         Float64         Float64         Float64         Fl ⋯
─────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────
       1 │     0.0             0.0             0.0             0.0             0.0             0.0             0.0     ⋯
       2 │     0.0             0.0             0.0             0.0             0.0             0.0             0.0
       3 │     0.0             0.0             0.0             0.0             0.0             0.0             0.0
       4 │     0.0             0.0             0.0             0.0             0.0             0.0             0.0
       5 │     0.0             0.0             0.0             0.0             0.0             0.0             0.0     ⋯
       6 │     0.0             0.0             0.0             0.0             0.0             0.0             0.0
       7 │     0.0             0.0             0.0             0.0             0.0             0.0             0.0
       8 │     0.0             0.0             0.0             0.0             0.0             0.0             0.0
       9 │     0.0             0.0             0.0             0.0             0.0             0.0             0.0     ⋯
      10 │     0.0             0.0             0.0             0.0             0.0             0.0             0.0
      11 │     0.0             0.0             0.0             0.0             0.0             0.0             0.0
    ⋮    │    ⋮           ⋮               ⋮               ⋮               ⋮               ⋮               ⋮            ⋱
 8830988 │     0.0             0.0             0.0             0.0             0.0             0.0             0.0
 8830989 │     0.0             0.0             0.0             0.0             0.0             0.0             0.0     ⋯
 8830990 │     0.0             0.0             0.0             0.0             0.0             0.0             0.0
 8830991 │     1.0             0.0             0.0             0.0             0.0             0.0             0.0
 8830992 │     0.0             0.0             0.0             0.0             0.0             0.0             0.0
 8830993 │     0.0             0.0             0.0             0.0             0.0             0.0             0.0     ⋯
 8830994 │     1.0             0.0             0.0             0.0             0.0             0.0             0.0
 8830995 │     1.0             0.0             0.0             0.0             0.0             0.0             0.0
 8830996 │     0.0             0.0             0.0             0.0             0.0             0.0             0.0
 8830997 │     0.0             0.0             0.0             0.0             0.0             0.0             0.0     ⋯
                                                                                     89 columns and 8830976 rows omitted
julia> m = reg(df, @formula(part ~ _Ibirthyr_1907+_Ibirthyr_1908+_Ibirthyr_1909+_Ibirthyr_1910+_Ibirthyr_1911+_Ibirthyr_1912+_Ibirthyr_1913+_Ibirthyr_1914+
_Ibirthyr_1915+_Ibirthyr_1916+_Ibirthyr_1917+_Ibirthyr_1918+_Ibirthyr_1919+_Ibirthyr_1920+_Ibirthyr_1921+_Ibirthyr_1922+_Ibirthyr_1923+_Ibirthyr_1924+
_Ibirthyr_1925+_Ibirthyr_1926+_Ibirthyr_1927+_Ibirthyr_1928+_Ibirthyr_1929+_Ibirthyr_1930+_Ibirthyr_1931+_Ibirthyr_1932+_Ibirthyr_1933+_Ibirthyr_1934+
_Ibirthyr_1935+_Ibirthyr_1936+_Ibirthyr_1937+_Ibirthyr_1938+_Ibirthyr_1939+_Ibirthyr_1940+_Ibirthyr_1941+_Ibirthyr_1942+_Ibirthyr_1943+_Ibirthyr_1944+
_Ibirthyr_1945+_Ibirthyr_1946+_Ibirthyr_1947+_Ibirthyr_1948+_Ibirthyr_1949+_Ibirthyr_1950+_Ibirthyr_1951+_Ibirthyr_1952+_Ibirthyr_1953+_Ibirthyr_1954+
_Ibirthyr_1955+_Ibirthyr_1956+_Ibirthyr_1957+_Ibirthyr_1958+_Ibirthyr_1959+_Ibirthyr_1960+__000001+_Ibirthyr_1962+_Ibirthyr_1963+_Ibirthyr_1964+
_Ibirthyr_1965+_Ibirthyr_1966+_Ibirthyr_1967+_Ibirthyr_1968+_Ibirthyr_1969+_Ibirthyr_1970+_Ibirthyr_1971+_Ibirthyr_1972+_Ibirthyr_1973+_Ibirthyr_1974+
_Ibirthyr_1975+_Ibirthyr_1976+_Ibirthyr_1977+_Ibirthyr_1978+_Ibirthyr_1979+_Ibirthyr_1980+_Ibirthyr_1981+_Ibirthyr_1982+_Ibirthyr_1983+_Ibirthyr_1984+
_Ibirthyr_1985+_Ibirthyr_1986+_Ibirthyr_1987+_Ibirthyr_1988+_Ibirthyr_1989+_Ibirthyr_1990+_Ibirthyr_1991+_Ibirthyr_1992+_Ibirthyr_1993+_Ibirthyr_1994+
_Ibirthyr_1995+_Ibirthyr_1996+_Ibirthyr_1997+_Ibirthyr_1998+_Ibirthyr_1999+_Ibirthyr_2000  + fe(survey))    , nthreads = 4 , tol=1.00000000000e-08)

                               Fixed Effect Model
================================================================================
Number of obs:                  8830997  Degrees of freedom:                  92
R2:                               0.024  R2 Adjusted:                      0.024
F-Stat:                         1675.69  p-value:                          0.000
R2 within:                        0.015  Iterations:                           1
================================================================================
part           |    Estimate  Std.Error  t value Pr(>|t|)   Lower 95%  Upper 95%
--------------------------------------------------------------------------------
_Ibirthyr_1907 |   -0.245888  0.0399852 -6.14948    0.000   -0.324257  -0.167518
_Ibirthyr_1908 |   -0.240937  0.0200867 -11.9948    0.000   -0.280306  -0.201568
_Ibirthyr_1909 |   -0.263179  0.0197842 -13.3025    0.000   -0.301955  -0.224402
_Ibirthyr_1910 |   -0.254113  0.0129464 -19.6281    0.000   -0.279487  -0.228738
_Ibirthyr_1911 |   -0.255636  0.0138299 -18.4844    0.000   -0.282742   -0.22853
_Ibirthyr_1912 |   -0.256784  0.0123504 -20.7916    0.000    -0.28099  -0.232578
_Ibirthyr_1913 |   -0.252984  0.0109203 -23.1664    0.000   -0.274388  -0.231581
_Ibirthyr_1914 |   -0.256519 0.00983208   -26.09    0.000    -0.27579  -0.237249
_Ibirthyr_1915 |   -0.258301 0.00771205 -33.4931    0.000   -0.273416  -0.243185
_Ibirthyr_1916 |   -0.251301   0.007272 -34.5574    0.000   -0.265554  -0.237048
_Ibirthyr_1917 |   -0.251159 0.00719553 -34.9049    0.000   -0.265262  -0.237056
_Ibirthyr_1918 |   -0.252904 0.00571224  -44.274    0.000     -0.2641  -0.241708
_Ibirthyr_1919 |   -0.250968 0.00549119 -45.7037    0.000    -0.26173  -0.240205
_Ibirthyr_1920 |   -0.248248 0.00458863 -54.1007    0.000   -0.257241  -0.239254
_Ibirthyr_1921 |   -0.245237 0.00439951 -55.7418    0.000    -0.25386  -0.236614
_Ibirthyr_1922 |   -0.243647 0.00449236 -54.2358    0.000   -0.252452  -0.234842
_Ibirthyr_1923 |   -0.243476 0.00395942 -61.4929    0.000   -0.251237  -0.235716
_Ibirthyr_1924 |   -0.239739 0.00386321  -62.057    0.000   -0.247311  -0.232168
_Ibirthyr_1925 |   -0.240398 0.00329999 -72.8482    0.000   -0.246866   -0.23393
_Ibirthyr_1926 |    -0.23616 0.00329518 -71.6683    0.000   -0.242619  -0.229702
_Ibirthyr_1927 |   -0.234743 0.00311323 -75.4019    0.000   -0.240845  -0.228642
_Ibirthyr_1928 |   -0.229105 0.00275798 -83.0698    0.000   -0.234511    -0.2237
_Ibirthyr_1929 |   -0.228704 0.00270057 -84.6874    0.000   -0.233997  -0.223411
_Ibirthyr_1930 |   -0.225933 0.00233095 -96.9273    0.000   -0.230501  -0.221364
_Ibirthyr_1931 |   -0.228161 0.00233622 -97.6622    0.000    -0.23274  -0.223582
_Ibirthyr_1932 |   -0.228699 0.00226792 -100.841    0.000   -0.233144  -0.224254
_Ibirthyr_1933 |   -0.228606 0.00208562  -109.61    0.000   -0.232694  -0.224518
_Ibirthyr_1934 |   -0.228292 0.00204416  -111.68    0.000   -0.232299  -0.224286
_Ibirthyr_1935 |   -0.226277 0.00187104 -120.936    0.000   -0.229945   -0.22261
_Ibirthyr_1936 |   -0.222548  0.0018566 -119.868    0.000   -0.226187  -0.218909
_Ibirthyr_1937 |   -0.222628 0.00184952  -120.37    0.000   -0.226253  -0.219003
_Ibirthyr_1938 |   -0.220128 0.00173838 -126.628    0.000   -0.223535  -0.216721
_Ibirthyr_1939 |   -0.219995 0.00170748 -128.842    0.000   -0.223342  -0.216648
_Ibirthyr_1940 |   -0.220104 0.00156163 -140.945    0.000   -0.223165  -0.217043
_Ibirthyr_1941 |     -0.2166 0.00160955 -134.572    0.000   -0.219755  -0.213446
_Ibirthyr_1942 |   -0.215136 0.00156292  -137.65    0.000   -0.218199  -0.212072
_Ibirthyr_1943 |   -0.212369   0.001527 -139.076    0.000   -0.215362  -0.209376
_Ibirthyr_1944 |   -0.210578 0.00156345 -134.688    0.000   -0.213642  -0.207513
_Ibirthyr_1945 |   -0.208761 0.00143075  -145.91    0.000   -0.211565  -0.205957
_Ibirthyr_1946 |   -0.135636 0.00145598 -93.1579    0.000    -0.13849  -0.132783
_Ibirthyr_1947 |   -0.134748 0.00147502 -91.3535    0.000   -0.137639  -0.131857
_Ibirthyr_1948 |   -0.132238 0.00141273 -93.6044    0.000   -0.135007  -0.129469
_Ibirthyr_1949 |   -0.132401 0.00140616 -94.1575    0.000   -0.135157  -0.129645
_Ibirthyr_1950 |   -0.127504 0.00130647 -97.5938    0.000   -0.130064  -0.124943
_Ibirthyr_1951 |   -0.125716 0.00133446 -94.2076    0.000   -0.128332  -0.123101
_Ibirthyr_1952 |    -0.12547 0.00132191 -94.9159    0.000   -0.128061   -0.12288
_Ibirthyr_1953 |    -0.12385 0.00127673 -97.0059    0.000   -0.126353  -0.121348
_Ibirthyr_1954 |   -0.122553 0.00126125 -97.1672    0.000   -0.125025  -0.120081
_Ibirthyr_1955 |   -0.173384 0.00121755 -142.404    0.000   -0.175771  -0.170998
_Ibirthyr_1956 |   -0.167128 0.00121466 -137.592    0.000   -0.169509  -0.164748
_Ibirthyr_1957 |   -0.162486 0.00121038 -134.243    0.000   -0.164858  -0.160113
_Ibirthyr_1958 |   -0.156857 0.00117068 -133.988    0.000   -0.159151  -0.154562
_Ibirthyr_1959 |   -0.146719 0.00116607 -125.824    0.000   -0.149004  -0.144433
_Ibirthyr_1960 |   -0.144422  0.0011126 -129.806    0.000   -0.146603  -0.142242
__000001       |         0.0        NaN      NaN      NaN         NaN        NaN
_Ibirthyr_1962 |   -0.122424 0.00113492 -107.871    0.000   -0.124649    -0.1202
_Ibirthyr_1963 |   -0.118468 0.00110999 -106.729    0.000   -0.120643  -0.116292
_Ibirthyr_1964 |   -0.111601 0.00111507 -100.085    0.000   -0.113787  -0.109416
_Ibirthyr_1965 |   -0.110394 0.00108402 -101.837    0.000   -0.112518  -0.108269
_Ibirthyr_1966 |   -0.108949 0.00110209 -98.8573    0.000   -0.111109  -0.106789
_Ibirthyr_1967 |   -0.108906 0.00111755 -97.4501    0.000   -0.111096  -0.106715
_Ibirthyr_1968 |   -0.108402  0.0010982 -98.7087    0.000   -0.110554  -0.106249
_Ibirthyr_1969 |   -0.108043 0.00108709 -99.3873    0.000   -0.110174  -0.105913
_Ibirthyr_1970 |   -0.109195 0.00106885 -102.161    0.000    -0.11129    -0.1071
_Ibirthyr_1971 |   -0.109449 0.00108716 -100.675    0.000    -0.11158  -0.107319
_Ibirthyr_1972 |   -0.107718 0.00108037 -99.7054    0.000   -0.109836  -0.105601
_Ibirthyr_1973 |   -0.110267 0.00108196 -101.915    0.000   -0.112388  -0.108146
_Ibirthyr_1974 |   -0.102156 0.00109414 -93.3658    0.000     -0.1043  -0.100011
_Ibirthyr_1975 |   -0.101302 0.00108169  -93.652    0.000   -0.103422  -0.099182
_Ibirthyr_1976 |  -0.0952319 0.00109914  -86.642    0.000  -0.0973862 -0.0930777
_Ibirthyr_1977 |  -0.0884749 0.00111556   -79.31    0.000  -0.0906614 -0.0862885
_Ibirthyr_1978 |  -0.0823051 0.00112864 -72.9244    0.000  -0.0845171  -0.080093
_Ibirthyr_1979 |   -0.078953 0.00113875 -69.3331    0.000  -0.0811849 -0.0767211
_Ibirthyr_1980 |  -0.0794783 0.00111344 -71.3805    0.000  -0.0816606  -0.077296
_Ibirthyr_1981 |  -0.0677975 0.00114323 -59.3032    0.000  -0.0700382 -0.0655568
_Ibirthyr_1982 |  -0.0657529 0.00113086  -58.144    0.000  -0.0679694 -0.0635365
_Ibirthyr_1983 |   -0.106154  0.0011442 -92.7764    0.000   -0.108397  -0.103912
_Ibirthyr_1984 |   -0.107567 0.00116716 -92.1618    0.000   -0.109855   -0.10528
_Ibirthyr_1985 |   -0.105736 0.00118811 -88.9951    0.000   -0.108065  -0.103407
_Ibirthyr_1986 |   -0.105918 0.00122616 -86.3819    0.000   -0.108322  -0.103515
_Ibirthyr_1987 |   -0.104192 0.00127568 -81.6756    0.000   -0.106693  -0.101692
_Ibirthyr_1988 |   -0.104842 0.00131977 -79.4397    0.000   -0.107429  -0.102255
_Ibirthyr_1989 |   -0.102505 0.00139315 -73.5781    0.000   -0.105236 -0.0997745
_Ibirthyr_1990 |   -0.103979 0.00146044 -71.1967    0.000   -0.106841  -0.101116
_Ibirthyr_1991 |  -0.0979455 0.00159499 -61.4082    0.000   -0.101072 -0.0948194
_Ibirthyr_1992 | 0.000426894 0.00172852 0.246971    0.805 -0.00296094 0.00381473
_Ibirthyr_1993 |  0.00524459 0.00182628  2.87174    0.004  0.00166515 0.00882403
_Ibirthyr_1994 |   0.0174228 0.00187872  9.27377    0.000   0.0137406   0.021105
_Ibirthyr_1995 |   0.0275258 0.00194972  14.1178    0.000   0.0237044  0.0313472
_Ibirthyr_1996 |   0.0341425 0.00205909  16.5813    0.000   0.0301067  0.0381782
_Ibirthyr_1997 |   0.0344863 0.00229145    15.05    0.000   0.0299952  0.0389775
_Ibirthyr_1998 |   0.0286934 0.00274934  10.4364    0.000   0.0233048   0.034082
_Ibirthyr_1999 |     0.01829 0.00367653  4.97479    0.000   0.0110841  0.0254959
_Ibirthyr_2000 |   0.0258094  0.0085951  3.00281    0.003  0.00896333  0.0426555
================================================================================

julia> plot(coef(m))
2023-11-13 (2)
droodman commented 9 months ago

The example I posted about OpenBlas had about half the columns dropped. I found that the problem went away with even small changes to the number of rows, so I figured I hit diminishing returns and stopped.