ajdamico / lodown

locally download and prepare publicly-available microdata
GNU General Public License v3.0
97 stars 47 forks source link

[POF] lodown download issue #156

Closed gpompeo closed 9 months ago

gpompeo commented 4 years ago

I am getting a download error when I try to get the POF catalog. Here are the messages I get:

pof_cat <- lodown( "pof" , pof_cat ) locally downloading pof

downloading from URL 'ftp://ftp.ibge.gov.br/Orcamentos_Familiares/Pesquisa_de_Orcamentos_Familiares_2017_2018/Microdados/Dados.zip' to file 'C:\Users\GUILHE~1\AppData\Local\Temp\RtmpoJHztT\file659463f974d5'

download issue with 'ftp://ftp.ibge.gov.br/Orcamentos_Familiares/Pesquisa_de_Orcamentos_Familiares_2017_2018/Microdados/Dados.zip'

download issue with 'ftp://ftp.ibge.gov.br/Orcamentos_Familiares/Pesquisa_de_Orcamentos_Familiares_2017_2018/Microdados/Dados.zip'

download issue with 'ftp://ftp.ibge.gov.br/Orcamentos_Familiares/Pesquisa_de_Orcamentos_Familiares_2017_2018/Microdados/Dados.zip'

R version 3.6.1 (2019-07-05) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] lodown_0.1.0

loaded via a namespace (and not attached): [1] httr_1.4.1 compiler_3.6.1 R6_2.4.0 tools_3.6.1 curl_4.2 Rcpp_1.0.3 cellranger_1.1.0 readxl_1.3.1 digest_0.6.22

lodown is now exiting unexpectedly. websites that host publicly-downloadable microdata change often and sometimes those changes cause this software to break. if the error call stack below appears to be a hiccup in your internet connection, then please verify your connectivity and retry the download. otherwise, please open a new issue at https://github.com/ajdamico/asdfree/issues with the contents of this error call stack and also the output of your sessionInfo().

[[1]] lodown("pof", pof_cat)

[[2]] withCallingHandlers(catalog <- load_fun(data_name = data_name, catalog, ...), error = function(e) { print(sessionInfo()) if (grepl("cannot allocate vector of size", e)) message(memory_note) else if (grepl("parameter must be specified", e)) message(parameter_note) else if (grepl("to install", e)) message(installation_note) else { message(unknown_error_note) print(sys.calls()) } })

[[3]] load_fun(data_name = data_name, catalog, ...)

[[4]] cachaca(catalog[i, "full_urls"], tf, mode = "wb")

[[5]] httr_filesize(this_url, attempts, sleepsec)

[[6]] stop(paste0("httr::HEAD( '", url, "' )\nfailed after ", initial.attempts, " attempts"))

[[7]] .handleSimpleError(function (e) { print(sessionInfo()) if (grepl("cannot allocate vector of size", e)) message(memory_note) else if (grepl("parameter must be specified", e)) message(parameter_note) else if (grepl("to install", e)) message(installation_note) else { message(unknown_error_note) print(sys.calls()) } }, "httr::HEAD( 'ftp://ftp.ibge.gov.br/Orcamentos_Familiares/Pesquisa_de_Orcamentos_Familiares_2017_2018/Microdados/Dados.zip' )\nfailed after 3 attempts", base::quote(httr_filesize(this_url, attempts, sleepsec)))

[[8]] h(simpleError(msg, call))

Error in httr_filesize(this_url, attempts, sleepsec) : httr::HEAD( 'ftp://ftp.ibge.gov.br/Orcamentos_Familiares/Pesquisa_de_Orcamentos_Familiares_2017_2018/Microdados/Dados.zip' ) failed after 3 attempts full_urls period 1 ftp://ftp.ibge.gov.br/Orcamentos_Familiares/Pesquisa_de_Orcamentos_Familiares_2017_2018/Microdados/Dados.zip 2017_2018 2 ftp://ftp.ibge.gov.br/Orcamentos_Familiares/Pesquisa_de_Orcamentos_Familiares_2008_2009/Microdados/Dados.zip 2008_2009 3 ftp://ftp.ibge.gov.br/Orcamentos_Familiares/Pesquisa_de_Orcamentos_Familiares_2002_2003/Microdados/Dados.zip 2002_2003 documentation 1 ftp://ftp.ibge.gov.br/Orcamentos_Familiares/Pesquisa_de_Orcamentos_Familiares_2017_2018/Microdados/documentacao.zip 2 ftp://ftp.ibge.gov.br/Orcamentos_Familiares/Pesquisa_de_Orcamentos_Familiares_2008_2009/Microdados/documentacao.zip 3 ftp://ftp.ibge.gov.br/Orcamentos_Familiares/Pesquisa_de_Orcamentos_Familiares_2002_2003/Microdados/Documentacao.zip aliment_file output_folder case_count 1 ftp://ftp.ibge.gov.br/Orcamentos_Familiares/Pesquisa_de_Orcamentos_Familiares_2017_2018/Microdados/tradutores.zip D:/OneDrive/Documents/POF/2017_2018 NA 2 ftp://ftp.ibge.gov.br/Orcamentos_Familiares/Pesquisa_de_Orcamentos_Familiares_2008_2009/Microdados/tradutores.zip D:/OneDrive/Documents/POF/2008_2009 NA 3 D:/OneDrive/Documents/POF/2002_2003 NA

I have the 7-zip installed.

gpompeo commented 4 years ago

UPDATE: I tried changing filesize_fun to fix download from FTP considering that httr is not supposed to work with ftp (r-lib/httr#537). Got the download going but got stuck in a loop downloading (and unzipping) the file until ran out of attempts.

gocdata commented 4 years ago

Hey @gpompeo, I managed to fix the problem with the httr:HEAD changing the filesize_fun parameter as well and updated the link from 2017-2018 that was broken. Made a pull request for that. Despite that, after downloading and unziping the files, my script stops to work at the step of unpacking the 7z files. It actually unpacks one file but something happens after that.

Have you find the solutions to make the package work properly?

gpompeo commented 4 years ago

@gocdata I haven´t managed to get it to work properly. I´ve noticed that there were several changes in the POF (specially on the first releases) so I tried a different path and managed to pre-process the files manually.

ajdamico commented 9 months ago

hi! apologies for the long delay. i've made a couple of big updates to asdfree.com that hopefully make the website a bit better, but i've decided to stop maintaining the lodown package so probably won't fix the bug you've reported. the new asdfree does have pof data, but only for the most current year. thanks