expersso / BIS

Programmatic access to BIS data
19 stars 10 forks source link

new data structure at BIS? #6

Open Stefan2015-5 opened 2 years ago

Stefan2015-5 commented 2 years ago

There seems to be a mess-up with the new data structure at BIS. They note that

24 February 2022 CSV files have been renamed according to the new Dataset Identifiers, find the new list here. For previously stored URLs, updating the path will be necessary. (ie for EER, old path https://www.bis.org/statistics/full_webstats_eer_d_dataflow_csv_col.zip has been changed to new path https://www.bis.org/statistics/full_eer_d_csv_col.zip).

see their homepage

The code (package version 0.2.1) produces now an empty df:

> require(BIS)
> datasets <- get_datasets()
> cons_data <- get_bis(datasets$url[datasets$name == "Consolidated banking statistics"] , quiet = TRUE)

Warning messages:
1: `as_data_frame()` was deprecated in tibble 2.0.0.
Please use `as_tibble()` instead.
The signature and semantics have changed, see `?as_tibble`.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated. 
2: The `x` argument of `as_tibble.matrix()` must have unique column names if `.name_repair` is omitted as of tibble 2.0.0.
Using compatibility `.name_repair`.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated. 

> head(cons_data)
# A tibble: 0 x 14
# ... with 14 variables: frequency <chr>, measure <chr>, reporting_country <chr>, cbs_bank_type <chr>, cbs_reporting_basis <chr>, balance_sheet_position <chr>,
#   type_of_instruments <chr>, remaining_maturity <chr>, currency_type_of_booking_location <chr>, counterparty_sector <chr>, counterparty_country <chr>,
#   collection_indicator <lgl>, date <chr>, obs_value <dbl>
vvoutilainen commented 2 years ago

There seems to be changes in the returned csv files as well. A few changes I've noticed: a) columns have been concatenated to include both short-had code value and name value, b) some columns names have changed (e.g. "Unit measure" -> "Unit of measure"), c) dataset "Consumer prices" is a weird mix of long and wide formats: column "Frequency" distinguishes between monthly/yearly values, but instead of having the values in long format as well, there are both monthly and yearly value columns back-to-back! I can't even begin to understand why one would do this...

The problem of data frames coming back empty results from there being (apparently new) columns (such as "Breaks", "Coverage") which contain missing values only (concerns only certain datasets). Now since read_bis() will return rows with full observations only, every row gets dropped. I solved the problem in my fork by simply passing na.drop = FALSE into read_bis_wide() and read_bis_long(), see here.