ajdamico / asdfree

analyze survey data for free
http://asdfree.com/
GNU General Public License v3.0
612 stars 449 forks source link

problem with ACS file download csv_pus.zip #319

Closed jeffrosenblum closed 6 years ago

jeffrosenblum commented 6 years ago

UPDATE: apparently a problem with default download.file method. I added this, and now it's working. Must have been a "greater than 2GB" problem?

options( "download.file.method" = "libcurl" )

Hi, me again! Still trying to get this to work. Now i'm using a newer computer with more diskspace and 32G RAM, thinking that it was a RAM problem when I was trying this last month.

Now i'm getting an error connected to one of the downloads. If I use that download URL directly in a browser, it downloads fine. any thoughts?

Jeff


R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> #install.packages( "devtools" , repos = "http://cran.rstudio.com/" )
> #library(devtools)
> #install_github( "ajdamico/lodown" , dependencies = TRUE )
> library(lodown)
> acs_cat <-
+   get_catalog( "acs" ,
+                output_dir = "/media/jeff/jeff/ACS2016" )
building catalog for acs

loading acs catalog from https://www2.census.gov/programs-surveys/acs/data/pums/2005

loading acs catalog from https://www2.census.gov/programs-surveys/acs/data/pums/2006

loading acs catalog from https://www2.census.gov/programs-surveys/acs/data/pums/2007

loading acs catalog from https://www2.census.gov/programs-surveys/acs/data/pums/2008

loading acs catalog from https://www2.census.gov/programs-surveys/acs/data/pums/2009

loading acs catalog from https://www2.census.gov/programs-surveys/acs/data/pums/2010

loading acs catalog from https://www2.census.gov/programs-surveys/acs/data/pums/2011

loading acs catalog from https://www2.census.gov/programs-surveys/acs/data/pums/2012

loading acs catalog from https://www2.census.gov/programs-surveys/acs/data/pums/2013

loading acs catalog from https://www2.census.gov/programs-surveys/acs/data/pums/2014

loading acs catalog from https://www2.census.gov/programs-surveys/acs/data/pums/2015

loading acs catalog from https://www2.census.gov/programs-surveys/acs/data/pums/2016

> 
> # 2015 5-year only
> acs_cat <- subset( acs_cat ,  time_period == '5-Year' )
> acs_cat <- subset( acs_cat ,  year == '2009' )
> acs_cat
  year time_period                                                         base_folder db_tablename                         dbfolder
9 2009      5-Year https://www2.census.gov/programs-surveys/acs/data/pums/2009/5-Year/  acs2009_5yr /media/jeff/jeff/ACS2016/MonetDB
                           output_filename include_puerto_rico
9 /media/jeff/jeff/ACS2016/acs2009_5yr.rds                TRUE
> # download the microdata to your local computer
> options( "monetdb.debug.query" = TRUE )
> lodown( "acs" , acs_cat)
locally downloading acs

'https://www2.census.gov/programs-surveys/acs/data/pums/2009/5-Year/unix_hwy.zip'
cached in
'/tmp/2f54d68e8017258184bea45768455029.Rcache'
copying to
'/tmp/Rtmp8DSYpH/filecb03ffaf23'

'https://www2.census.gov/programs-surveys/acs/data/pums/2009/5-Year/csv_hus.zip'
cached in
'/tmp/0235b6f01d550a70c984a40f9ac7c130.Rcache'
copying to
'/tmp/Rtmp8DSYpH/filecb03ffaf23'

'https://www2.census.gov/programs-surveys/acs/data/pums/2009/5-Year/csv_hpr.zip'
cached in
'/tmp/6aeaeec05af8a57cee0b9edad8110c9d.Rcache'
copying to
'/tmp/Rtmp8DSYpH/filecb03ffaf23'

note: column name 'type' unacceptable in monetdb.  changing to 'type_'

QQ: 'CREATE TABLE h (serialno STRING, rt STRING, division DOUBLE PRECISION, puma DOUBLE PRECISION, region DOUBLE PRECISION, st DOUBLE PRECISION, adjhsg DOUBLE PRECISION, adjinc DOUBLE PRECISION, wgtp DOUBLE PRECISION, np DOUBLE PRECISION, type_ DOUBLE PRECISION, acr DOUBLE PRECISION, ags DOUBLE PRECISION, bds DOUBLE PRECISION, bld DOUBLE PRECISION, bus DOUBLE PRECISION, conp DOUBLE PRECISION, elep DOUBLE PRECISION, fs DOUBLE PRECISION, fulp DOUBLE PRECISION, gasp DOUBLE PRECISION, hfl DOUBLE PRECISION, insp DOUBLE PRECISION, mhp DOUBLE PRECISION, mrgi DOUBLE PRECISION, mrgp DOUBLE PRECISION, mrgt DOUBLE PRECISION, mrgx DOUBLE PRECISION, rms DOUBLE PRECISION, rntm DOUBLE PRECISION, rntp DOUBLE PRECISION, smp DOUBLE PRECISION, tel DOUBLE PRECISION, ten DOUBLE PRECISION, vacs DOUBLE PRECISION, val DOUBLE PRECISION, veh DOUBLE PRECISION, watp DOUBLE PRECISION, ybl DOUBLE PRECISION, fes DOUBLE PRECISION, fincp DOUBLE PRECISION, fparc DOUBLE PRECISION, grntp DOUBLE PRECISION, grpip DOUBLE PRECISION, hhl DOUBLE PRECISION, hht DOUBLE PRECISION, hincp DOUBLE PRECISION, hugcl DOUBLE PRECISION, hupac DOUBLE PRECISION, hupaoc DOUBLE PRECISION, huparc DOUBLE PRECISION, kit DOUBLE PRECISION, lngi DOUBLE PRECISION, mv DOUBLE PRECISION, noc DOUBLE PRECISION, npf DOUBLE PRECISION, npp DOUBLE PRECISION, nr DOUBLE PRECISION, nrc DOUBLE PRECISION, ocpip DOUBLE PRECISION, partner DOUBLE PRECISION, plm DOUBLE PRECISION, psf DOUBLE PRECISION, r18 DOUBLE PRECISION, r60 DOUBLE PRECISION, r65 DOUBLE PRECISION, resmode DOUBLE PRECISION, smocp DOUBLE PRECISION, smx DOUBLE PRECISION, srnt DOUBLE PRECISION, sval DOUBLE PRECISION, taxp DOUBLE PRECISION, wif DOUBLE PRECISION, wkexrel DOUBLE PRECISION, workstat DOUBLE PRECISION, facrp DOUBLE PRECISION, fagsp DOUBLE PRECISION, fbdsp DOUBLE PRECISION, fbldp DOUBLE PRECISION, fbusp DOUBLE PRECISION, fconp DOUBLE PRECISION, felep DOUBLE PRECISION, ffsp DOUBLE PRECISION, ffulp DOUBLE PRECISION, fgasp DOUBLE PRECISION, fhflp DOUBLE PRECISION, finsp DOUBLE PRECISION, fkitp DOUBLE PRECISION, fmhp DOUBLE PRECISION, fmrgip DOUBLE PRECISION, fmrgp DOUBLE PRECISION, fmrgtp DOUBLE PRECISION, fmrgxp DOUBLE PRECISION, fmvp DOUBLE PRECISION, fplmp DOUBLE PRECISION, frmsp DOUBLE PRECISION, frntmp DOUBLE PRECISION, frntp DOUBLE PRECISION, fsmp DOUBLE PRECISION, fsmxhp DOUBLE PRECISION, fsmxsp DOUBLE PRECISION, ftaxp DOUBLE PRECISION, ftelp DOUBLE PRECISION, ftenp DOUBLE PRECISION, fvacsp DOUBLE PRECISION, fvalp DOUBLE PRECISION, fvehp DOUBLE PRECISION, fwatp DOUBLE PRECISION, fyblp DOUBLE PRECISION, wgtp1 DOUBLE PRECISION, wgtp2 DOUBLE PRECISION, wgtp3 DOUBLE PRECISION, wgtp4 DOUBLE PRECISION, wgtp5 DOUBLE PRECISION, wgtp6 DOUBLE PRECISION, wgtp7 DOUBLE PRECISION, wgtp8 DOUBLE PRECISION, wgtp9 DOUBLE PRECISION, wgtp10 DOUBLE PRECISION, wgtp11 DOUBLE PRECISION, wgtp12 DOUBLE PRECISION, wgtp13 DOUBLE PRECISION, wgtp14 DOUBLE PRECISION, wgtp15 DOUBLE PRECISION, wgtp16 DOUBLE PRECISION, wgtp17 DOUBLE PRECISION, wgtp18 DOUBLE PRECISION, wgtp19 DOUBLE PRECISION, wgtp20 DOUBLE PRECISION, wgtp21 DOUBLE PRECISION, wgtp22 DOUBLE PRECISION, wgtp23 DOUBLE PRECISION, wgtp24 DOUBLE PRECISION, wgtp25 DOUBLE PRECISION, wgtp26 DOUBLE PRECISION, wgtp27 DOUBLE PRECISION, wgtp28 DOUBLE PRECISION, wgtp29 DOUBLE PRECISION, wgtp30 DOUBLE PRECISION, wgtp31 DOUBLE PRECISION, wgtp32 DOUBLE PRECISION, wgtp33 DOUBLE PRECISION, wgtp34 DOUBLE PRECISION, wgtp35 DOUBLE PRECISION, wgtp36 DOUBLE PRECISION, wgtp37 DOUBLE PRECISION, wgtp38 DOUBLE PRECISION, wgtp39 DOUBLE PRECISION, wgtp40 DOUBLE PRECISION, wgtp41 DOUBLE PRECISION, wgtp42 DOUBLE PRECISION, wgtp43 DOUBLE PRECISION, wgtp44 DOUBLE PRECISION, wgtp45 DOUBLE PRECISION, wgtp46 DOUBLE PRECISION, wgtp47 DOUBLE PRECISION, wgtp48 DOUBLE PRECISION, wgtp49 DOUBLE PRECISION, wgtp50 DOUBLE PRECISION, wgtp51 DOUBLE PRECISION, wgtp52 DOUBLE PRECISION, wgtp53 DOUBLE PRECISION, wgtp54 DOUBLE PRECISION, wgtp55 DOUBLE PRECISION, wgtp56 DOUBLE PRECISION, wgtp57 DOUBLE PRECISION, wgtp58 DOUBLE PRECISION, wgtp59 DOUBLE PRECISION, wgtp60 DOUBLE PRECISION, wgtp61 DOUBLE PRECISION, wgtp62 DOUBLE PRECISION, wgtp63 DOUBLE PRECISION, wgtp64 DOUBLE PRECISION, wgtp65 DOUBLE PRECISION, wgtp66 DOUBLE PRECISION, wgtp67 DOUBLE PRECISION, wgtp68 DOUBLE PRECISION, wgtp69 DOUBLE PRECISION, wgtp70 DOUBLE PRECISION, wgtp71 DOUBLE PRECISION, wgtp72 DOUBLE PRECISION, wgtp73 DOUBLE PRECISION, wgtp74 DOUBLE PRECISION, wgtp75 DOUBLE PRECISION, wgtp76 DOUBLE PRECISION, wgtp77 DOUBLE PRECISION, wgtp78 DOUBLE PRECISION, wgtp79 DOUBLE PRECISION, wgtp80 DOUBLE PRECISION)'
II: Finished in 0.35s
QQ: 'copy offset 2 into h from '/tmp/Rtmp8DSYpH/ss09husa.csv' using delimiters ',','\n','"'  NULL AS '''
II: Finished in 42.84s
QQ: 'copy offset 2 into h from '/tmp/Rtmp8DSYpH/ss09husb.csv' using delimiters ',','\n','"'  NULL AS '''
II: Finished in 57.77s
QQ: 'copy offset 2 into h from '/tmp/Rtmp8DSYpH/ss09husc.csv' using delimiters ',','\n','"'  NULL AS '''
II: Finished in 1.29s
QQ: 'copy offset 2 into h from '/tmp/Rtmp8DSYpH/ss09husd.csv' using delimiters ',','\n','"'  NULL AS '''
II: Finished in 1.33s
QQ: 'select schemas.name as sn, tables.name as tn from sys.tables join sys.schemas on tables.schema_id=schemas.id'
II: Finished in 24.34s
QQ: 'select columns.name as name from sys.columns join sys.tables on 
    columns.table_id=tables.id where tables.name='h''
II: Finished in 0s
QQ: 'copy offset 2 into h from '/tmp/Rtmp8DSYpH/ss09hpr.csv' using delimiters ',','\n','"'  NULL AS '''
II: Finished in 1.72s
'https://www2.census.gov/programs-surveys/acs/data/pums/2009/5-Year/unix_pwy.zip'
cached in
'/tmp/7de54712694b2df6ebabfd61bf64793a.Rcache'
copying to
'/tmp/Rtmp8DSYpH/filecb03ffaf23'

Downloading from URL
'https://www2.census.gov/programs-surveys/acs/data/pums/2009/5-Year/csv_pus.zip'
to file
'/tmp/Rtmp8DSYpH/filecb03ffaf23'

--2017-12-23 13:11:55--  https://www2.census.gov/programs-surveys/acs/data/pums/2009/5-Year/csv_pus.zip
Resolving www2.census.gov (www2.census.gov)... 104.97.43.163, 2001:559:19:897::208c, 2001:559:19:88a::208c
Connecting to www2.census.gov (www2.census.gov)|104.97.43.163|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2017-12-23 13:11:56 ERROR 403: Forbidden.

downloaded file size on disk (0) does not match server's content length (1924235684)
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] lodown_0.1.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.14      bindr_0.1         knitr_1.17        xml2_1.1.1        magrittr_1.5      hms_0.4.0         rvest_0.3.2      
 [8] archive_1.0.0     R6_2.2.2          rlang_0.1.6       stringr_1.2.0     httr_1.3.1        dplyr_0.7.4       tools_3.2.3      
[15] DBI_0.7           selectr_0.3-1     digest_0.6.13     assertthat_0.2.0  tibble_1.3.4      bindrcpp_0.2      readr_1.1.1      
[22] codetools_0.2-14  curl_3.1          glue_1.2.0.9000   MonetDBLite_0.5.0 haven_1.1.0       stringi_1.1.6     forcats_0.2.0    
[29] XML_3.98-1.9      pkgconfig_2.0.1  

lodown is now exiting unexpectedly.
websites that host publicly-downloadable microdata change often and sometimes those changes cause this software to break.
if the error call stack below appears to be a hiccup in your internet connection, then please verify your connectivity and retry the download.
otherwise, please open a new issue at `https://github.com/ajdamico/asdfree/issues` with the contents of this error call stack and also the output of your `sessionInfo()`.

[[1]]
lodown("acs", acs_cat)

[[2]]
withCallingHandlers(catalog <- load_fun(data_name = data_name, 
    catalog, ...), error = function(e) {
    print(sessionInfo())
    if (grepl("cannot allocate vector of size", e)) 
        message(memory_note)
    else if (grepl("parameter must be specified", e)) 
        message(parameter_note)
    else if (grepl("to install", e)) 
        message(installation_note)
    else {
        message(unknown_error_note)
        print(sys.calls())
    }
})

[[3]]
load_fun(data_name = data_name, catalog, ...)

[[4]]
cachaca(this_download, tf, mode = "wb", filesize_fun = "httr")

[[5]]
stop(paste("download failed after", initial.attempts, "attempts"))

[[6]]
.handleSimpleError(function (e) 
{
    print(sessionInfo())
    if (grepl("cannot allocate vector of size", e)) 
        message(memory_note)
    else if (grepl("parameter must be specified", e)) 
        message(parameter_note)
    else if (grepl("to install", e)) 
        message(installation_note)
    else {
        message(unknown_error_note)
        print(sys.calls())
    }
}, "download failed after 3 attempts", quote(cachaca(this_download, 
    tf, mode = "wb", filesize_fun = "httr")))

[[7]]
h(simpleError(msg, call))

Error in cachaca(this_download, tf, mode = "wb", filesize_fun = "httr") : 
  download failed after 3 attempts
In addition: Warning message:
In (function (url, destfile, method, quiet = FALSE, mode = "w",  :
  download had nonzero exit status
  year time_period                                                         base_folder db_tablename                         dbfolder
9 2009      5-Year https://www2.census.gov/programs-surveys/acs/data/pums/2009/5-Year/  acs2009_5yr /media/jeff/jeff/ACS2016/MonetDB
                           output_filename include_puerto_rico case_count
9 /media/jeff/jeff/ACS2016/acs2009_5yr.rds                TRUE         NA