ajdamico / asdfree

analyze survey data for free
http://asdfree.com/
GNU General Public License v3.0
612 stars 449 forks source link

Errors downloading basic monthly CPS #361

Closed mcyost closed 10 months ago

mcyost commented 2 years ago

Love the lodown package, have used it to successfully download and parse ACS and CES data. Tried to use it to download the basic monthly CPS, and got an error. Not doing anything fancy with the code:

library(lodown)
lodown( "cpsbasic" , output_dir = file.path( path.expand( "~" ) , "CPSBASIC" ) )
get_catalog("cpsbasic")

Here's the error: Error in rvest::html_table(xml2::read_html(cps_ftp), fill = TRUE)[[2]] : subscript out of bounds

I looked at the installation code for the package and found that the error was generated by this line in cpsbasic.R: cps_table <- rvest::html_table( xml2::read_html( cps_ftp ) , fill = TRUE )[[2]]

I removed the subscript [[2]] (I don't see that any functions point to cps_table...) and reinstalled the package manually. It now gives me a much longer error message:

> lodown( "cpsbasic" , output_dir = file.path( path.expand( "~" ) , "CPSBASIC" ) )
building catalog for cpsbasic

locally downloading cpsbasic

downloading from URL

'//www2.census.gov/programs-surveys/cps/datasets/2022/basic/jan22pub.zip'

to file

'C:\Users\EU0122~1\AppData\Local\Temp\2\RtmpyM7PoD\file34308331970'

download issue with

'//www2.census.gov/programs-surveys/cps/datasets/2022/basic/jan22pub.zip'

download issue with

'//www2.census.gov/programs-surveys/cps/datasets/2022/basic/jan22pub.zip'

download issue with

'//www2.census.gov/programs-surveys/cps/datasets/2022/basic/jan22pub.zip'

R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] lodown_0.1.0         usethis_1.5.0        devtools_2.0.2       RevoUtils_11.0.3     RevoUtilsMath_11.0.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1        compiler_3.5.3    prettyunits_1.0.2 remotes_2.0.4     tools_3.5.3       testthat_2.3.2    digest_0.6.18     pkgbuild_1.3.1    pkgload_1.0.2     lattice_0.20-38   memoise_1.1.0    
[12] rlang_0.4.5       Matrix_1.2-17     cli_2.0.2         curl_3.3          withr_2.5.0       httr_1.4.2        stringr_1.4.0     xml2_1.2.0        desc_1.4.1        fs_1.4.1          grid_3.5.3       
[23] rprojroot_1.3-2   glue_1.4.0        R6_2.4.0          processx_3.4.2    fansi_0.4.0       survival_2.44-1.1 sessioninfo_1.1.1 callr_3.2.0       selectr_0.4-1     magrittr_1.5      splines_3.5.3    
[34] backports_1.1.4   ps_1.3.2          assertthat_0.2.1  rvest_0.3.3       survey_3.35-1     stringi_1.4.3     crayon_1.3.4     

lodown is now exiting unexpectedly.
websites that host publicly-downloadable microdata change often and sometimes those changes cause this software to break.
if the error call stack below appears to be a hiccup in your internet connection, then please verify your connectivity and retry the download.
otherwise, please open a new issue at `https://github.com/ajdamico/asdfree/issues` with the contents of this error call stack and also the output of your `sessionInfo()`.

[[1]]
lodown("cpsbasic", output_dir = file.path(path.expand("~"), 
    "CPSBASIC"))

[[2]]
withCallingHandlers(catalog <- load_fun(data_name = data_name, 
    catalog, ...), error = function(e) {
    print(sessionInfo())
    if (grepl("cannot allocate vector of size", e)) 
        message(memory_note)
    else if (grepl("parameter must be specified", e)) 
        message(parameter_note)
    else if (grepl("to install", e)) 
        message(installation_note)
    else {
        message(unknown_error_note)
        print(sys.calls())
    }
})

[[3]]
load_fun(data_name = data_name, catalog, ...)

[[4]]
cachaca(catalog[i, "full_url"], tf, mode = "wb")

[[5]]
httr_filesize(this_url, attempts, sleepsec)

[[6]]
stop(paste0("httr::HEAD( '", url, "' )\nfailed after ", 
    initial.attempts, " attempts"))

[[7]]
.handleSimpleError(function (e) 
{
    print(sessionInfo())
    if (grepl("cannot allocate vector of size", e)) 
        message(memory_note)
    else if (grepl("parameter must be specified", e)) 
        message(parameter_note)
    else if (grepl("to install", e)) 
        message(installation_note)
    else {
        message(unknown_error_note)
        print(sys.calls())
    }
}, "httr::HEAD( '//www2.census.gov/programs-surveys/cps/datasets/2022/basic/jan22pub.zip' )\nfailed after 3 attempts", 
    quote(httr_filesize(this_url, attempts, sleepsec)))

[[8]]
h(simpleError(msg, call))

Error in httr_filesize(this_url, attempts, sleepsec) : 
  httr::HEAD( '//www2.census.gov/programs-surveys/cps/datasets/2022/basic/jan22pub.zip' )
failed after 3 attempts
  year month X..www2.census.gov.programs.surveys.cps.datasets.2022.basic.2020_Basic_CPS_Public_Use_Record_Layout_plus_IO_Code_list.txt version
2 2022     1  //www2.census.gov/programs-surveys/cps/datasets/2022/basic/2020_Basic_CPS_Public_Use_Record_Layout_plus_IO_Code_list.txt   basic
1 2022     2  //www2.census.gov/programs-surveys/cps/datasets/2022/basic/2020_Basic_CPS_Public_Use_Record_Layout_plus_IO_Code_list.txt   basic
                                                                 full_url   dd                                                 output_filename case_count
2 //www2.census.gov/programs-surveys/cps/datasets/2022/basic/jan22pub.zip <NA> C:\\Users\\EU01221457\\Documents/CPSBASIC/2022 01 cps basic.rds         NA
1 //www2.census.gov/programs-surveys/cps/datasets/2022/basic/feb22pub.zip <NA> C:\\Users\\EU01221457\\Documents/CPSBASIC/2022 02 cps basic.rds         NA

Here's my sessionInfo():

> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] lodown_0.1.0         usethis_1.5.0        devtools_2.0.2       RevoUtils_11.0.3     RevoUtilsMath_11.0.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1        compiler_3.5.3    prettyunits_1.0.2 remotes_2.0.4     tools_3.5.3       testthat_2.3.2    digest_0.6.18     pkgbuild_1.3.1    pkgload_1.0.2     lattice_0.20-38   memoise_1.1.0     rlang_0.4.5      
[13] Matrix_1.2-17     cli_2.0.2         curl_3.3          withr_2.5.0       httr_1.4.2        stringr_1.4.0     xml2_1.2.0        desc_1.4.1        fs_1.4.1          grid_3.5.3        rprojroot_1.3-2   glue_1.4.0       
[25] R6_2.4.0          processx_3.4.2    fansi_0.4.0       survival_2.44-1.1 sessioninfo_1.1.1 callr_3.2.0       selectr_0.4-1     magrittr_1.5      splines_3.5.3     backports_1.1.4   ps_1.3.2          assertthat_0.2.1 
[37] rvest_0.3.3       survey_3.35-1     stringi_1.4.3     crayon_1.3.4     
> 

Any chance for a fix or some troubleshooting? Thanks.

ajdamico commented 2 years ago

hi :-) might be some time before i'm able to debug this, a pull request would be excellent if you believe you can fix the issue!

ajdamico commented 10 months ago

hi! apologies for the long delay. i've made a couple of big updates to asdfree.com that hopefully make the website a bit better, but i've decided to stop maintaining the lodown package so probably won't fix the bug you've reported. the new asdfree does have acs, ces, and cps-asec data, but they're only for the most current year and unfortunately doesn't include the cps basic. thanks