aaowens / PSID.jl

Quickly assemble data from the Panel Study of Income Dynamics (PSID)
MIT License
25 stars 9 forks source link

Package does not work for new PSID data #45

Closed KelingZ closed 7 months ago

KelingZ commented 2 years ago
  1. PSID code book The link provided in README.md for PSID code book (https://simba.isr.umich.edu/downloads/PSIDCodebook.zip) is broken. I downloaded PSID code book from PSID data center and came across the following error message when running the code:

ERROR: LightXML.XMLParseError{String}("Failure in parsing an XML file.")

Screen Shot 2022-09-07 at 9 08 00 PM

Then I downloaded the xml PSID code book from the google drive link (https://drive.google.com/file/d/1nz1UaVGcj0ur2Bp3ev7a8agJbj0A5JTF/view), I can successfully make the codebook.

The code does not seem to be compatible with the new PSID codebook format?

  1. psid.xlsx The link provided in README.md for psid.xlsx still works. But the code has a problem reading in the xlsx file.

ERROR: AssertionError: isempty(XML_GLOBAL_ERROR_STACK)

Screen Shot 2022-09-07 at 9 05 40 PM
aaowens commented 2 years ago

I think I know what the problem is for the PSID Codebook XML. There's an invalid character in variable V1350, in QTEXT.

Look here:

1970
      <TYPE_ID>1</TYPE_ID>
      <NAME>V1350</NAME>
      <LABEL>WTR WKD-69 (R)</LABEL>
      <QTEXT>F1.  During the last year (1969), did you (HEAD) do any work for money? (Retired, ...) (1970 question)</QTEXT>
      <ETEXT> </ETEXT>

and remove the invalid character. Then it should parse.

I can't check it right now, not at the right computer.

That's interesting that it can't parse the new XLSX. I'll have to look at it later.

aaowens commented 2 years ago

I sent an email to the PSID people asking why the XML codebook is gone.

I'm currently updating all the package dependencies to the latest versions + Julia 1.8. With those changes, I was able to create a PSID dataset fine using my local copies of the data.

aaowens commented 2 years ago

Hopefully fixed now, I tagged a new version of the package. There's a new link to the XML codebook in the readme