PSLmodels / taxdata

The TaxData project prepares microdata for use with the Tax-Calculator microsimulation project.
http://pslmodels.github.io/taxdata/
Other
20 stars 30 forks source link

CPS BUG: n24 and nu18 not always consistent with taxpayer/spouse ages #409

Open martinholmer opened 2 years ago

martinholmer commented 2 years ago

In the course of commenting on Tax-Calculator issue 2630, I found that in the CPS data there are tax units for which the n24 and nu18 variables are inconsistent with the taxpayer and spouse ages (age_head and age_spouse).

I executed the following Tax-Calculator run:

(taxcalc-dev) ~% tc --version                     
Tax-Calculator 3.2.1

(taxcalc-dev) ~% tc cps.csv 2022 --sqldb

and tabulated the dump output as follows:

(taxcalc-dev) ~% cat bug2.sql
.mode column

select count(*) as total_num_rows
from dump;

with cte as (
select n24, nu18, MARS,
       cast(age_head<18 as integer) as tu18, -- taxpayer age<18
       cast(MARS=2 and age_spouse<18 as integer) as su18 -- spouse age<18
from dump
)
select MARS, n24, nu18, tu18, su18,
       count(*) as "num_rows_with_n24>(nu18-tu18-su18)"
from cte
where n24>(nu18-tu18-su18)
group by MARS, n24, nu18, tu18, su18
order by MARS, n24, nu18, tu18, su18;

(taxcalc-dev) ~% sqlite3 cps-22-#-#-#.db <bug2.sql
total_num_rows
--------------
280005        
MARS  n24  nu18  tu18  su18  num_rows_with_n24>(nu18-tu18-su18)
----  ---  ----  ----  ----  ----------------------------------
2     0    0     0     1     1                                 
2     1    1     0     1     59                                
2     1    2     1     1     1                                 
2     2    2     0     1     19                                
2     3    3     0     1     11                                
2     4    4     0     1     3                                          

@andersonfrailey, I'm hoping you can fix this bug soon. Dealing with the inconsistent CPS data has been an enormous waste of my time (and, I imagine, a waste of other users' time). And, in addition to the wasted time, problems like this tend to erode user confidence in the data used by Tax-Calculator.