ceopinio / CEOdata

CEOdata R package
5 stars 1 forks source link

CEOdata error #6

Closed fred-udina closed 2 years ago

fred-udina commented 2 years ago

Happy to know that CEOdata package exists! But in my first atempt... I'm I doing anything wrong? Frederic

> d <- CEOdata(reo="1031")
A problem downloading the metadata has occurred. The server may be temporarily down, or the file name has changed. Please try again later or open an issue at https://github.com/ceopinio/CEOdata indicating 'Problem with metadata file'
A problem downloading the metadata has occurred. The server may be temporarily down, or the file name has changed. Please try again later or open an issue at https://github.com/ceopinio/CEOdata indicating 'Problem with metadata file'
Error in if (!is.na(url.reo)) { : argument is of length zero
> CEOmeta(reo = "746")
A problem downloading the metadata has occurred. The server may be temporarily down, or the file name has changed. Please try again later or open an issue at https://github.com/ceopinio/CEOdata indicating 'Problem with metadata file'
Error in UseMethod("filter") : 
  no applicable method for 'filter' applied to an object of class "NULL"
xfim commented 2 years ago

It is certainly quite strange, because using the latest version this works for me perfecty.

Can you please check that your Internet connection allows to to retrieve this URL for reo 1031?

Also, what happens when you run

CEOmeta()

Are you able to get the metadata of all the CEO studies?

fred-udina commented 2 years ago

Hi, Thanks for your quick answer. I could indeed get the zip file from the link you asked for. I have the last version of CEOdata (just installed today) and also R and RStudio recent versions, I work from my UPF office, so from the catalan universities network. And:

> CEOmeta()
A problem downloading the metadata has occurred. The server may be temporarily down, or the file name has changed. Please try again later or open an issue at https://github.com/ceopinio/CEOdata indicating 'Problem with metadata file'
NULL
xfim commented 2 years ago

Bona tarda, @fred-udina ,

Definitively a sort of network-related problem. Not yours. I am assuming you are on Windows.

What do you get with CEOdata() without arguments?

My suspicion right now is an obscure problem with how R in Windows deals with secure servers, specifically servers from the gencat, that during development have been proved to be troublesome. I don't have an easy access to a Windows machine, but let me inspect it.

And thank you very much for reporting it. So far we have had other users (also working from the same phisical location and machines) and nothing has popped up. So please let me inspect it.

fred-udina commented 2 years ago

I work with MacOS 10.15.7. I hope that CEOdata will work fine with Windows because it is what my students will mainly use.

CEOdata() with no args works for me:

> CEOdata()
Downloading the barometer.
trying URL 'https://ceo.gencat.cat/web/.content/20_barometre/Matrius_BOP/Microdades_barometre.zip'
Content type 'application/zip' length 10111044 bytes (9.6 MB)
==================================================
downloaded 9.6 MB

Converting the original data into R. This may take a while.
Post-processing the data. This may take a while.
# A tibble: 37,838 × 962
   PONDERA ORDRECINE ORDRE_R…¹   REO METOD…² BOP_NUM   ANY   MES   DIA HORA_…³ HORA_…⁴ DATA_INI DATA_FIN DURADA FASE  ENQUESTAD…⁵
     <dbl>     <dbl>     <dbl> <dbl> <fct>   <fct>   <dbl> <dbl> <dbl> <time>  <time>  <date>   <date>    <dbl> <fct>       <dbl>

but I still have

> d <- CEOdata(reo="1031")
A problem downloading the metadata has occurred. The server may be temporarily down, or the file name has changed. Please try again later or open an issue at https://github.com/ceopinio/CEOdata indicating 'Problem with metadata file'
A problem downloading the metadata has occurred. The server may be temporarily down, or the file name has changed. Please try again later or open an issue at https://github.com/ceopinio/CEOdata indicating 'Problem with metadata file'
Error in if (!is.na(url.reo)) { : argument is of length zero

It doesn't look like any problem with the network:

~$ wget https://ceo.gencat.cat/web/.content/30_estudis/repositorimatrius/2022/Microdades_anonimitzades_1031.zip
--2022-09-14 15:15:17--  https://ceo.gencat.cat/web/.content/30_estudis/repositorimatrius/2022/Microdades_anonimitzades_1031.zip
Resolving ceo.gencat.cat (ceo.gencat.cat)... 23.39.109.188
Connecting to ceo.gencat.cat (ceo.gencat.cat)|23.39.109.188|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 458386 (448K) [application/zip]
Saving to: ‘Microdades_anonimitzades_1031.zip’

Microdades_anonimit 100%[===================>] 447,64K  --.-KB/s    in 0,06s   

2022-09-14 15:15:18 (7,35 MB/s) - ‘Microdades_anonimitzades_1031.zip’ saved [458386/458386]
fred-udina commented 2 years ago

I just tried it on my home Mac, macOS 12.5.1, R 4.2.1, CEOdata 1.2.0.1 The problem is the same with CEOdata(reo = "1031").

xfim commented 2 years ago

Yes, I have managed to try it on a Windows machine and that is also the case. My GNU/Linux, though, does work well. I'm on it.

fred-udina commented 2 years ago

Just to play with it, I tried R in a gnu/linux virtual box running in my mac. The same problem appears when asking for reo=1031, no when asking CEOdata() without args.

xfim commented 2 years ago

Thank you @fred-udina, for helping me out.

I think I have found it.

Can you please also install "curl" (install.packages('curl')) and then repeat it and report back? Thank you.

I have found that for some reason that I have to understand 'curl' is no more loading and when calling jsonline to retrieve the metadata it does not work.

The main merged barometer is not affected because it does not load its data from the metadata.

A Temporary shortcut would be to do something like:

CEOdata() |>
  filter(REO == "1031")

in order to achieve the same behaviour than with CEOdata(reo = "1031").

But it must work properly anyway.

fred-udina commented 2 years ago

Yes, it is. I've had some problems with urls some time ago that it was fixed by curl package!

> install.packages("curl")
trying URL 'https://cran.rstudio.com/bin/macosx/contrib/4.2/curl_4.3.2.tgz'
Content type 'application/x-gzip' length 861741 bytes (841 KB)
==================================================
downloaded 841 KB

The downloaded binary packages are in
    /var/folders/n3/dyjkdb8d66vbrchsszv6vmzm0000gp/T//RtmpD22Vy7/downloaded_packages
> library(curl)
Using libcurl 7.79.1 with LibreSSL/3.3.6
> CEOdata(reo = "1031") -> d
trying URL 'https://ceo.gencat.cat/web/.content/30_estudis/repositorimatrius/2022/Microdades_anonimitzades_1031.zip'
Content type 'application/zip' length 458386 bytes (447 KB)
==================================================
downloaded 447 KB

Converting the original data into R. This may take a while.
> 
xfim commented 2 years ago

OK, thank you for confirming, @fred-udina . I will leave this issue opened until we decide what to do with 'curl' that depends on jsonline (as seen by the issue aforementioned).

fred-udina commented 2 years ago

This is quite weird. In my main mac CEOdata(reo="1031") wasn't working. Then I install curl, I do NOT attach it but then it works.

> d <- CEOdata(reo="1031")
A problem downloading the metadata has occurred. The server may be temporarily down, or the file name has changed. Please try again later or open an issue at https://github.com/ceopinio/CEOdata indicating 'Problem with metadata file'
A problem downloading the metadata has occurred. The server may be temporarily down, or the file name has changed. Please try again later or open an issue at https://github.com/ceopinio/CEOdata indicating 'Problem with metadata file'
Error in if (!is.na(url.reo)) { : argument is of length zero
> install.packages("curl")
trying URL 'https://cran.rstudio.com/bin/macosx/contrib/4.2/curl_4.3.2.tgz'
Content type 'application/x-gzip' length 861741 bytes (841 KB)
==================================================
downloaded 841 KB

The downloaded binary packages are in
    /var/folders/n3/dyjkdb8d66vbrchsszv6vmzm0000gq/T//RtmptYlrfo/downloaded_packages
> d <- CEOdata(reo="1031")
trying URL 'https://ceo.gencat.cat/web/.content/30_estudis/repositorimatrius/2022/Microdades_anonimitzades_1031.zip'
Content type 'application/zip' length 458386 bytes (447 KB)
==================================================
downloaded 447 KB

Converting the original data into R. This may take a while.
> 
xfim commented 2 years ago

Without investigating more, I would say that this is reasonable, as "curl", the package, also touches other functions that then gain "curl goodies" such as encryption, etc... Also, jsonline itself, which is called in CEOdata(), loads curl silently if it is available in the system. So it is expected behaviour.

Missatge de fred-udina @.***> del dia dj., 15 de set. 2022 a les 10:38:

This is quite weird. In my main mac CEOdata(reo="1031") wasn't working. Then I install curl, I do NOT attach it but then it works.

d <- CEOdata(reo="1031") A problem downloading the metadata has occurred. The server may be temporarily down, or the file name has changed. Please try again later or open an issue at https://github.com/ceopinio/CEOdata indicating 'Problem with metadata file' A problem downloading the metadata has occurred. The server may be temporarily down, or the file name has changed. Please try again later or open an issue at https://github.com/ceopinio/CEOdata indicating 'Problem with metadata file' Error in if (!is.na(url.reo)) { : argument is of length zero install.packages("curl") trying URL 'https://cran.rstudio.com/bin/macosx/contrib/4.2/curl_4.3.2.tgz' Content type 'application/x-gzip' length 861741 bytes (841 KB)

downloaded 841 KB

The downloaded binary packages are in /var/folders/n3/dyjkdb8d66vbrchsszv6vmzm0000gq/T//RtmptYlrfo/downloaded_packages

d <- CEOdata(reo="1031") trying URL 'https://ceo.gencat.cat/web/.content/30_estudis/repositorimatrius/2022/Microdades_anonimitzades_1031.zip' Content type 'application/zip' length 458386 bytes (447 KB)

downloaded 447 KB

Converting the original data into R. This may take a while.

— Reply to this email directly, view it on GitHub https://github.com/ceopinio/CEOdata/issues/6#issuecomment-1247774606, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFEW45XIDI65Z3S6XEEYYDV6LN7ZANCNFSM6AAAAAAQMH7HLI . You are receiving this because you commented.Message ID: @.***>

-- Xavier

fred-udina commented 2 years ago

Well, some say that R is not a real programming language...

fred-udina commented 2 years ago

Just a question: are you planning to declare CEOdata package as dependent on curl? Otherwise I should instruct my students to load it before using CEOdata. Any small trouble is for them a demotivating disaster.

xfim commented 2 years ago

So far there is a message (pending approval in the main repository of the ceo) about the temporal need to ensure tha curl is installed (you can see it in my fork.

The proper way to proceed would be to wait on the input of jsonline, because that is where the dependency issue lies. In case this is not successful, then we could add a dependency, but it is not my prefered option, as CEOdata depends on jsonlite, which is the package involved in that, and there the dependency on curl is not resolved.

For students, you can instruct them to do something like this at the very beginning (I do it myself in my classes). It is very convenient because from the first day of the course all the packages are properly loaded. Of course, you can adapt it to your needs:

install.packages(c("CEOdata", "curl", "ggplot2", "dplyr", tidyr", "ggmcmc"), dependencies = TRUE)
fred-udina commented 2 years ago

Yes, I agree with your approach. Thanks.

xfim commented 2 years ago

So let's wait for the reply and keep this issue open for some more time.

xfim commented 2 years ago

This has been solved at 'jsonlite' (see jsonlite's issue), and now 'curl' is no more a dependency. Still, we need to keep the information in the main site to make users aware of the necessity of 'curl', as the CRAN version still hasn't the new code without 'curl'.

fred-udina commented 2 years ago

Perfect, thank you again.