Closed ernietedeschi closed 6 years ago
i'm not able to reproduce this, but i believe you :) could you make sure you have the latest version of lodown, and then tell me why this is breaking?
library(lodown)
cpsbasic_cat <-
get_catalog( "cpsbasic" ,
output_dir = file.path( path.expand( "~" ) , "CPSBASIC" ) )
debug(lodown:::cps_dd_parser)
lodown:::cps_dd_parser( subset( cpsbasic_cat , year == 2017 & month == 4 )$dd )
Looks like it's getting hung up on three variables in the_result
: PEDISREM, PEDISOUT, and further down, PECERT3. You can see below that for those three, the condition that start_position[i] = end_position[i-1]+1
doesn't hold.
varname | width | start_position | end_position | divisor
PXCOHAB | 2 | 904 | 905 | 1.00E+00
PEDISREM | 2 | 910 | 911 | 1.00E+00
PEDISOUT | 2 | 916 | 917 | 1.00E+00
PRDISFLG | 2 | 918 | 919 | 1.00E+00
...
...
PTNMEMP2 | 2 | 942 | 943 | 1.00E+00
PECERT3 | 2 | 948 | 949 | 1.00E+00
PXCERT1 | 2 | 950 | 951 | 1.00E+00
great! i'm sure this has something to do with the three dot special character in
PEDISEAR 2 IS…DEAF OR DOES…HAVE SERIOUS 906 - 907
in the data dictionary. http://ceprdata.org/wp-content/cps/CPS_Basic_Data_Dictionary_2015.txt
could you figure out why the columns in the data dictionary are being wiped out on your machine, and what change we could make so they're maintained?
OK. I will dig in. In case it’s relevant, I’m running this in macOS High Sierra.
Looks like I have to eliminate two more special characters.
the_lines <- gsub("\u0085", "X", the_lines)
the_lines <- gsub("\\u0085", "X", the_lines)
the_lines <- gsub("\\\u0085", "X", the_lines)
the_lines <- gsub("\u0092", "X", the_lines)
the_lines <- gsub("\\u0092", "X", the_lines)
the_lines <- gsub("\\\u0092", "X", the_lines)
the_dd <- gsub("\u0085", "X", the_dd)
the_dd <- gsub("\\u0085", "X", the_dd)
the_dd <- gsub("\\\u0085", "X", the_dd)
the_dd <- gsub("\u0092", "X", the_dd)
the_dd <- gsub("\\u0092", "X", the_dd)
the_dd <- gsub("\\\u0092", "X", the_dd)
nice! could you send a pull request?
See PR #140
thanks a lot
Not doing anything fancy here:
Here's the error: