MattCowgill / readabs

Download and tidy time series data from the Australian Bureau of Statistics in R
https://mattcowgill.github.io/readabs/
Other
101 stars 22 forks source link

read_abs trips over for occasionally quarterly data #167

Open mcooganj opened 3 years ago

mcooganj commented 3 years ago

There are a number of publications that include quarterly data from time to time. For example, the retail sales publication has real tables four times per year.

There may be a way to set the vintage of the release, but I couldn't find it. I had thought that perhaps looking up by series_id would work. I would imagine that there's a look-up table at some point that turns the series_id into a release-table pair.

Perhaps in the case that it's an occasionally-quarterly release, it could re-direct to the most recent quarterly publication?

R> read_abs(series_id="A3349269F") Finding URLs for tables corresponding to ABS series ID Attempting to download files from series ID , Retail Trade, Australia Downloading https://www.abs.gov.au/statistics/industry/retail-and-wholesale-trade/retail-trade-australia/latest-release/850107.xls trying URL 'https://www.abs.gov.au/statistics/industry/retail-and-wholesale-trade/retail-trade-australia/latest-release/850107.xls' Error in utils::download.file(url = url, destfile = destfile, mode = "wb", : cannot open URL 'https://www.abs.gov.au/statistics/industry/retail-and-wholesale-trade/retail-trade-australia/latest-release/850107.xls' In addition: Warning message: In utils::download.file(url = url, destfile = destfile, mode = "wb", : cannot open URL 'https://www.abs.gov.au/statistics/industry/retail-and-wholesale-trade/retail-trade-australia/latest-release/850107.xls': HTTP status was '404 Not Found'

MattCowgill commented 3 years ago

Thanks @mcooganj I'll look into this

MattCowgill commented 3 years ago

PS There's no series ID-release table lookup table within the package; if you request a series ID, the package queries an ABS API (the Time Series Directory) to find the corresponding release table

Henry-DJPR commented 2 years ago

I'm having a similar problem with detailed labour force quarterly ANZSIC tables. They appear to be valid timeseries sheets but they aren't in the time series directory. Here's an example of a missing table. Doesn't work when I look for it specifically, download everything or looks for one of its component series. read_abs("6291.0.55.001", 4) read_abs("6291.0.55.001") %>% count(table_title) read_abs(series_id = "A84090257V") The problem appears to be that they straight up aren't in the directory: shell.exec("https://abs.gov.au/servlet/TSSearchServlet?sid=A84090257V") I wanted to check that I haven't missed something simple before I contact the ABS?

MattCowgill commented 2 years ago

Thanks @Henry-DJPR. You're right, this is a problem on the ABS side, the series have disappeared from the Time Series Directory. I will contact them now, I'm in regular contact with the people who maintain the TSD.

I realise this isn't ideal, but a workaround is to do:

download_abs_data_cube("labour-force-australia-detailed",
                       "6291004") %>% 
  read_abs_local(filenames = .) 
MattCowgill commented 2 years ago

@Henry-DJPR, the problem with the ABS Time Series Directory appears to have been resolved. This now works:

readabs::read_abs("6291.0.55.001", "4")
#> Finding URLs for tables corresponding to ABS catalogue 6291.0.55.001
#> Attempting to download files from catalogue 6291.0.55.001, Labour Force, Australia, Detailed
#> Downloading https://www.abs.gov.au/statistics/labour/employment-and-unemployment/labour-force-australia-detailed/latest-release/6291004.xlsx
#> Extracting data from downloaded spreadsheets
#> Tidying data from imported ABS spreadsheets
#> # A tibble: 9,000 × 12
#>    table_no sheet_no table_title  date       series  value series_type data_type
#>    <chr>    <chr>    <chr>        <date>     <chr>   <dbl> <chr>       <chr>    
#>  1 6291004  Data1    Table 04. E… 1984-11-01 Agric…   NA   Trend       STOCK    
#>  2 6291004  Data1    Table 04. E… 1984-11-01 Agric…  403.  Seasonally… STOCK    
#>  3 6291004  Data1    Table 04. E… 1984-11-01 Agric…  411.  Original    STOCK    
#>  4 6291004  Data1    Table 04. E… 1984-11-01 Minin…   NA   Trend       STOCK    
#>  5 6291004  Data1    Table 04. E… 1984-11-01 Minin…   94.8 Seasonally… STOCK    
#>  6 6291004  Data1    Table 04. E… 1984-11-01 Minin…   94.1 Original    STOCK    
#>  7 6291004  Data1    Table 04. E… 1984-11-01 Manuf…   NA   Trend       STOCK    
#>  8 6291004  Data1    Table 04. E… 1984-11-01 Manuf… 1096.  Seasonally… STOCK    
#>  9 6291004  Data1    Table 04. E… 1984-11-01 Manuf… 1099.  Original    STOCK    
#> 10 6291004  Data1    Table 04. E… 1984-11-01 Elect…   NA   Trend       STOCK    
#> # … with 8,990 more rows, and 4 more variables: collection_month <chr>,
#> #   frequency <chr>, series_id <chr>, unit <chr>

Created on 2022-05-02 by the reprex package (v2.0.1)

Henry-DJPR commented 2 years ago

Brilliant! Thanks!