Getting the CTC right - Githubissues

donboyd5 commented 1 week ago

In #125 @martinholmer said:

It seems to me the more important Phase 3 issues highlighted by the examination results are:

Why has the CTC tax expenditure estimate from tmd.csv dropped to such a low amount? Why is the SALT tax expenditure estimate so low?

SALT is now in #126.

Regarding CTC, perhaps we can focus on input values. Here is a table of weighted sums of selected child-related variables (for data_source==1) -- which are not targeted -- with original weights (s006_original, pre-optimization) and final weights (s006, "optimized") -- amounts are millions of weighted records:

As you can see, reweighting drastically reduces the number of children on tax-filer records. Given that they are not targeted (at present), this is not a problem with targeting but rather with reweighting. As we've discussed quite a bit, the reweighting objective function does not currently penalize large changes in weights and we need to do that. @nikhilwoodruff should be ready to show us on Wednesday results that reflect such a penalty. We should have a good sense then whether this is the problem and what the appropriate solution is.

martinholmer commented 1 week ago

@donboyd5, Thanks, these weighted (original and reweighted) child-count totals are quite interesting.

Do you have a hunch about why this is happening?

Could the n24 variable be added to the reweighting loss function?

donboyd5 commented 1 week ago

I think it's probably the lack of a penalty for weight deviations in the loss function that allows the optimization routine to pick some bad weights that have unintended consequences, because they are not tied to the original weights at all! I'm hoping this will be addressed for our call on Wednesday. We certainly can put it into the loss function but it will be very interesting to see what happens when we have proper penalties first. I believe that should fix a variety of unintended effects.

martinholmer commented 3 hours ago

Given the results of merging PR #148, I don't think your analysis in issue #127 is quite right. Seems like the bad CTC results pre-PR#148 were cause by the use of "taxable returns" statistics for all the targets (not the lack of a reweighting penalty).

PSLmodels / tax-microdata-benchmarking

Getting the CTC right #127