CenterOnBudget / cbpp-stata-utils

Stata utility programs created by and for researchers at CBPP.
https://centeronbudget.github.io/cbpp-stata-utils/
Other
2 stars 1 forks source link

Issues loading multiple years of CPS data because of variable format mismatch #18

Closed shingtgen closed 2 years ago

shingtgen commented 2 years ago

Describe the bug I tried using load_data to load March 2019-2021 CPS data but hit an error in the append process because variable tax_id changes format over this time period. I got the following error message:

variable tax_id is long in master but str14 in using data You could specify append's force option to ignore this numeric/string mismatch. The using variable would then be treated as if it contained numeric missing value.

To Reproduce load_data cps, years(2019/2021)

Stata version MP/17

c-zippel commented 2 years ago

TY for flagging! I've been stymied by this too, so I got Arloc's approval to revamp the CPS datasets library so that files (at least back to 2010 or so) are consistently de-stringed, labeled, and include replicate weights.

In the mean time, I think load_data could quietly destring the files before appending them.

shingtgen commented 2 years ago

That sounds good! Similar issue for ACS and serialno too, although I know you've already accounted for that in load_data. Would it make sense to update that in the dataset library also? Or do you think it makes sense to handle these differently since one is a variable format issue (CPS) and the other is a change in data format (ACS)?

c-zippel commented 2 years ago

The latter, I think. Also, I am hesitant to retroactively modify the existing ACS datasets on the off chance that it breaks someone’s code.

From: Stephanie Hingtgen @.> Sent: Thursday, March 31, 2022 8:48 AM To: CenterOnBudget/cbpp-stata-utils @.> Cc: Claire Zippel @.>; State change @.> Subject: Re: [CenterOnBudget/cbpp-stata-utils] Issues loading multiple years of CPS data because of variable format mismatch (Issue #18)

CAUTION: This email originated from outside the organization. Do not click links or open attachments unless you know the content is safe.

That sounds good! Similar issue for ACS and serialno too, although I know you've already accounted for that in load_data. Would it make sense to update that in the dataset library also? Or do you think it makes sense to handle these differently since one is a variable format issue (CPS) and the other is a change in data format (ACS)?

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/CenterOnBudget/cbpp-stata-utils/issues/18*issuecomment-1084534667__;Iw!!OXx53w!2AXM6oRHaA8wbm_6l0gEPuDVPxI3IYYuab58o64gjqahMtOuqNVJEdi84Ki5gMyvy3M4PkxR12tq6JT1YqeKY0A$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AJ3XEPXDG3EA2SQMHMWULTTVCWNJBANCNFSM5R4KQWSQ__;!!OXx53w!2AXM6oRHaA8wbm_6l0gEPuDVPxI3IYYuab58o64gjqahMtOuqNVJEdi84Ki5gMyvy3M4PkxR12tq6JT1BtQMXA4$. You are receiving this because you modified the open/close state.Message ID: @.**@.>>