expersso / OECD

Reproducible and programmatic access to OECD data
126 stars 20 forks source link

more help on accessing data #3

Closed ptoche closed 7 years ago

ptoche commented 7 years ago

I just discovered your package. Very interesting. I've been able to replicate the examples in the vignette, but I'm having problem understanding how to get other data. For instance, if I wanted to access the Adult Unemployment Rate and the Youth Unemployment Rate for France and the United States. I tried the following, but it's not working. I think my problem is working out how to set up a filter. Any help appreciated: more examples in the vignette would be useful -- to me and probably others. Thanks!

library("OECD")
dataset <- "AEO2012_CH6_FIG4"
dstruc <- get_data_structure(dataset)
dstruc$MEASURE
##     id                                       label
## 1  AUR                 Adult unemployment rate (%)
## 2  YUR                 Youth unemployment rate (%)
## 3 YUAU Youth unemployment / Adult unemployment (%)
filter <- list(c("USA", "FRA"), c("AUR", "YUR"))
df <- get_dataset(dataset = dataset, filter = filter)
Error in rsdmx::readSDMX(url) : 
  HTTP request failed with status: 400 Bad Request
expersso commented 7 years ago

Thanks for using the package. Indeed, it could use some more documentation. Personally, I always use the OECD.stat website to find the data and breakdowns I need, and then just use the package to download it in a reproducible fashion.

That said, there appears to be a number of problems with your request. Firstly, it appears that OECD.stat is down at the moment; I keep getting this for all datasets

This dataset preview is momentarily unavailable.

Please try again or select another dataset.

Secondly, the dataset you're looking at (AEO2012_CH6_FIG4) is from the African Economic Outlook (hence the "AEO" in the code). So if you look at dstruc$INDICATOR you'll see that the data is only available for African countries. So it's because you're filtering with USA and FRA that you're getting a 400 Bad Request.

That said, I can't download that dataset at all, and it appears that the problem is somewhere in the rsdmx package, which does all the heavy lifting in terms of downloading and parsing the data. I've never had that problem before, but my hunch is that it's related to the specific data set.

To get unemployment data for USA and FRA, I would suggest waiting for OECD.stat to get back online, finding the unemployment section, and then follow the procedure laid out in the section Alternative data-acquisition strategy of the vignette.

If that still doesn't work, let me know, and I'll see if I can help further.

ptoche commented 7 years ago

Thanks for the feedback!

I have managed to get some data from some of the datasets using the alternative method, e.g. with "FTPTC_D" but failed with this one, "STLABOUR." I wonder if I'm asking too much data by filtering solely on "FRA+USA"? Are there limitations in the amount of data that may be retrieved?

    ### Example that failed: console hangs, does not return error message
    library("OECD")
    dataset <- "STLABOUR"
    #http://stats.oecd.org/restsdmx/sdmx.ashx/GetData/STLABOUR/AUS+AUT+BEL+CAN+CHL+CZE+DNK+EST+FIN+FRA+DEU+GRC+HUN+ISL+IRL+ISR+ITA+JPN+KOR+LVA+LUX+MEX+NLD+NZL+NOR+POL+PRT+SVK+SVN+ESP+SWE+CHE+TUR+GBR+USA+EA19+EU28+G-7+OECD.LRUN24FE+LRUN24MA+LRUN24TT+LRUN25FE+LRUN25MA+LRUN25TT+LRUN55FE+LRUN55MA+LRUN55TT+LRUN64FE+LRUN64MA+LRUN64TT+LRUN74FE+LRUN74MA+LRUN74TT+LRUNTTFE+LRUNTTMA+LRUNTTTT.STSA.A+Q/all?startTime=2015&endTime=2017
    ## Select a subset of the data
    df <- get_dataset(dataset, 
        filter = "FRA+USA",
        pre_formatted = TRUE)

    ### Example that worked
    library("OECD")
    dataset <- "FTPTC_D"
    #http://stats.oecd.org/restsdmx/sdmx.ashx/GetData/STLABOUR/AUS+AUT+BEL+CAN+CHL+CZE+DNK+EST+FIN+FRA+DEU+GRC+HUN+ISL+IRL+ISR+ITA+JPN+KOR+LVA+LUX+MEX+NLD+NZL+NOR+POL+PRT+SVK+SVN+ESP+SWE+CHE+TUR+GBR+USA+EA19+EU28+G-7+OECD.LRUN24FE+LRUN24MA+LRUN24TT+LRUN25FE+LRUN25MA+LRUN25TT+LRUN55FE+LRUN55MA+LRUN55TT+LRUN64FE+LRUN64MA+LRUN64TT+LRUN74FE+LRUN74MA+LRUN74TT+LRUNTTFE+LRUNTTMA+LRUNTTTT.STSA.A+Q/all?startTime=2015&endTime=2017
    df <- get_dataset(dataset, 
        filter = "FRA+USA",
        pre_formatted = TRUE)
expersso commented 7 years ago

Yeah, when the console appears to hang, that usually means that you're trying to fetch a very large data set. For example, selecting a smaller subset of your first query returns a result in a few seconds.

dataset <- "STLABOUR"

filt <- list(
  "AUS+AUT+BEL+CAN+CHL+CZE+DNK+EST+FIN+FRA+DEU+GRC+HUN+ISL+IRL+ISR+ITA",
  "LRUN24FE+LRUN24MA+LRUN24TT+LRUN25FE+LRUN25MA+LRUN25TT+LRUN55FE+LRUN55MA", 
  "STSA", 
  "A"
)

system.time(df <- get_dataset(dataset, filt))
   user  system elapsed 
  12.09    0.00   12.67 

My approach is usually to figure out what I want using OECD.stat, then trying to fetch that data for one country, seeing if it is what I need, and then finally expanding to the full set of countries I need. For example, in your first query you probably don't need both annual (A) and quarterly (Q) data, so querying for e.g. just annual data cuts the size of the query by 80%.

To my knowledge, the API documentation doesn't say anything about limitations on the size of the query. But since it's generally a bit slow, and doesn't provide any feedback on the progress of the data download, it's indeed generally difficult to know if a query is just taking very long or whether R just crashed. But following the strategy described above, I rarely have any problems of that kind.

ptoche commented 7 years ago

That was a very helpful explanation, thanks a lot! Having more examples like the above is also great: helped see how to construct my own queries. Thumbs up!

pfescriva commented 3 years ago

How do you know the name of a dataset in R? Thanks!