dfe-analytical-services / dfeR

Common R tasks in the Department for Education (DfE)
https://dfe-analytical-services.github.io/dfeR/
GNU General Public License v3.0
8 stars 2 forks source link

Add in functions for working out DBI bugs #65

Open cjrace opened 5 months ago

cjrace commented 5 months ago

Is your feature request related to a problem? Please describe.

Rich has made a couple of functions that I think we can generalise and add into here that will be helpful for querying SQL databases in R.

Describe the solution you'd like

Functions that can be generalised to most purposes / queries that analysts can use to work around issues more easily.

Describe alternatives you've considered

None.

Additional context

Here's the raw functions:

# DBI Bugfix
# Error message:
# Error: nanodbc/nanodbc.cpp:3166: 07009: [Microsoft][SQL Server Native Client 11.0]Invalid Descriptor Index 
# Warning message:
#  In dbClearResult(rs) : Result already cleared
# ========================================================
# Cause:
# nvarchar that has a CHARACTER_MAXIMUM_LENGTH in the schema table of -1 for some entries
tbl_fix_nvarchar <- function(con, schema, table){
  column.types <- dbGetQuery(
    con,
    paste0("SELECT COLUMN_NAME, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH FROM INFORMATION_SCHEMA.COLUMNS ",
          "WHERE TABLE_NAME='",table,"'")
  )

  ct <- column.types %>%
    mutate(cml = case_when(
      is.na(CHARACTER_MAXIMUM_LENGTH) ~ 10,
      CHARACTER_MAXIMUM_LENGTH == -1 ~ 100000,
      TRUE ~ as.double(CHARACTER_MAXIMUM_LENGTH)
    )
    ) %>%
    arrange(cml) %>%
    pull(COLUMN_NAME)

  tbl(con, Id(schema=schema, table=table)) %>% select(all_of(ct))

}

#' DBI Bugfix
#' Error message:
#' Error: nanodbc/nanodbc.cpp:3166: 07009: [Microsoft][SQL Server Native Client 11.0]Invalid Descriptor Index 
#' Warning message:
#'  In dbClearResult(rs) : Result already cleared
#' ========================================================
#' Cause:
#' nvarchar that has a CHARACTER_MAXIMUM_LENGTH in the schema table of -1 for some entries
#' Usage:
#' This applies the fix as a variation on the select function, which works at the end of pipe-chain queries.
#' e.g. 
#' tbl(con, Id(schema='dbo', table='mytable')) %>% 
#'    group_by(x,y) %>% summarise(s=sum(z)) %>% 
#'    select_fix_nvarchar('mytable', c('x','y','s'))
select_fix_nvarchar <- function( table, table_name, selection){
  column.types <- dbGetQuery(
    con,
    paste0("SELECT COLUMN_NAME, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH FROM INFORMATION_SCHEMA.COLUMNS ",
           "WHERE TABLE_NAME='",table_name,"'")
  )
  ct <- column.types %>%
    mutate(cml = case_when(
      is.na(CHARACTER_MAXIMUM_LENGTH) ~ 10,
      CHARACTER_MAXIMUM_LENGTH == -1 ~ 100000,
      TRUE ~ as.double(CHARACTER_MAXIMUM_LENGTH)
    )
    ) %>%
    arrange(cml) %>%
    pull(COLUMN_NAME)
  # I want to add in any new calculated columns in table_name and add them in at the start of the selection 
  ct_all <- c(setdiff(selection,ct), ct) %>% unique()
  ct_intersect <- intersect(ct_all,selection)
  table %>% select(all_of(ct_intersect))
}