MaxGhenis commented 3 years ago

Starting over from @prrathi's https://github.com/PSLmodels/OG-USA-Calibration/pull/37 for simplicity. This is based on CPS tax units only, since that's what we agreed upon for consistency with other OG-USA modeling (fewer people per unit than PSID families). There's still code that could add similar data from the PSID in #37, but I think it'd be cleaner to have it run off the psid_lifetime_income dataset; I compressed an older version of it, which fits under GitHub's 100MB limit, but I haven't re-run it since adding the num_in_family column.

The main output of this is data/hh_composition/cps.csv, which should just be filtered to smoothed==True for usage in OG-USA. It also exports charts to images/hh_composition/. @jdebacker could you suggest how to add this matrix to ogusa_default_parameters.json? It doesn't look like any of the code in this repo is currently connected to that file (it was added all at once in #35).

@rickecon

codecov-commenter commented 3 years ago

Codecov Report

Merging #39 (45f602f) into master (5bae7c5) will not change coverage. The diff coverage is n/a.

@@           Coverage Diff           @@
##           master      #39   +/-   ##
=======================================
  Coverage   63.13%   63.13%           
=======================================
  Files           8        8           
  Lines        1188     1188           
=======================================
  Hits          750      750           
  Misses        438      438

Flag	Coverage Δ
unittests	`63.13% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 5bae7c5...45f602f. Read the comment docs.

MaxGhenis commented 3 years ago

Per the discussion in today's developer meeting, I'll add this to calibrate.py.

rickecon commented 3 years ago

@MaxGhenis @jdebacker . This looks good to me, except for the build and test CI failure (due to some out-of-date codecov stuff). The most recent commit fixes a data issue for the number of adults age 18-64. I like having the 6 .png files and one .csv file in this case. This looks much better.

The build and test failure is coming from the following error message:

{'detail': ErrorDetail(string='Unable to locate build via Github Actions API. Please upload with the Codecov repository upload token to resolve issue.', code='not_found')}
Error: Codecov failed with the following error: The process '/usr/bin/bash' failed with exit code 1

I don't know how to fix this off the top of my head. I will have to look into it. I wonder if we are missing a coverage component in the environment.yml file.

jdebacker commented 3 years ago

@MaxGhenis Thanks for the contributions in this PR!

The main output of this is data/hh_composition/cps.csv, which should just be filtered to smoothed==True for usage in OG-USA.

I've looked at the code a bit and can't figure out what you mean by this. Do you mean change lined 93 and/or 94 of hh_composition.py?

I think what you should do is to create a function in hh_composition.py that returns the results (i.e., a matrix) that can be read into OG-USA. This function should be called in the calibrate.Calibration class and the result should become and attribute of that class.

Does the program take much time to arrive at the data/hh_composition/cps.csv file? Is it sufficient to just use the link to the TaxData CPS file rather than store this CSV file here?

It also exports charts to images/hh_composition/.

Can you share and describe some of these images in this thread? I would also recommend adding these images to the Jupyter Book documentation (as part of a new chapter to describe that what the output is and how it was arrived at). That could be done in a separate PR -- but for this one, I'd recommend against having png files outside of the docs/ directory.

@jdebacker could you suggest how to add this matrix to ogusa_default_parameters.json? It doesn't look like any of the code in this repo is currently connected to that file (it was added all at once in #35).

Do not worry about updating that json file. I'll update it after the OG-USA PR #725 is complete. That PR will be updated to add the matrix created here to the OG-USA default_parameters.json file.

MaxGhenis commented 3 years ago

@rickecon I've fixed the off-by-one error, so this is now good to use for the UBI paper via pd.read_csv("https://github.com/PSLmodels/OG-USA-Calibration/raw/b0ad2816a0c7fbbb9099060f89b7ef36d85f8f5b/data/hh_composition/cps.csv")

MaxGhenis commented 3 years ago

I think what you should do is to create a function in hh_composition.py that returns the results (i.e., a matrix) that can be read into OG-USA. This function should be called in the calibrate.Calibration class and the result should become and attribute of that class.

OK I think @rickecon is doing something like this for the UBI project, I'll also look into it here.

Does the program take much time to arrive at the data/hh_composition/cps.csv file? Is it sufficient to just use the link to the TaxData CPS file rather than store this CSV file here?

This file is not the full taxdata cps csv, it's a summary by s,j, counted age group (<18, 18-64, 65+), and whether it's raw or smoothed. I think storing this summarized data as a csv in the repo would make this input more accessible to other researchers than storing it only in the json file as three separate matrices (and presumably without the unsmoothed values). It's currently 152KB.

Can you share and describe some of these images in this thread? I would also recommend adding these images to the Jupyter Book documentation (as part of a new chapter to describe that what the output is and how it was arrived at). That could be done in a separate PR -- but for this one, I'd recommend against having png files outside of the docs/ directory.

Sounds good, I'll add the images to the docs with descriptions.

PSLmodels / OG-USA

Add CPS-based household composition files for UBI #39

Codecov Report