Open javier-gracia-tabuenca-tuni opened 1 year ago
Problems seems to come i need to add the "project" to the schema
NOT WORKING person_table <- dplyr::tbl(connection, DatabaseConnector::inDatabaseSchema("atlas-development-270609.finngen_omop_r11","person"))
WORKING person_table <- dplyr::tbl(connection, DatabaseConnector::inDatabaseSchema("finngen_omop_r11","person"))
the first make a CATALOG the second a SCHEMA
this should be easy to fix
but not sure what you think will be the elegant way
may be adding a parameter dbms to inDatabaseSchema , so that inDatabaseSchema behaves differently for bq and potentially others ??
i solved it locally as this
tmp_inDatabaseSchema <- function(databaseSchema, table, dbms="") {
if(dbms=="bigquery"){
return(dbplyr::in_schema(databaseSchema, table))
}
databaseSchema <- strsplit(databaseSchema, "\\.")[[1]]
if (length(databaseSchema) == 1) {
return(dbplyr::in_schema(databaseSchema[1], table))
} else {
return(dbplyr::in_catalog(databaseSchema[1], databaseSchema[2], table))
}
}
Im not making a PR bcs Im not sure this is the way you would like to solve it
this is important bcs in bigquery the path to a table is in the form \<project>.\<dataset>.\<table>
Some times (data project != billing project ) the
Yeah, I don't like adding the dbms
parameter, because then any code written using in_database_schema()
would need to provide that.
It seems to me the problem isn't so much that the project get's mistaken for a catalog, but rather that BigQuery doesn't allow the project name to appear in a join ON
statement? So in your first example
SELECT *
from (
select
atlas-development-270609.finngen_omop_r11.person.*,
observation_period_start_date,
observation_period_end_date
from atlas-development-270609.finngen_omop_r11.person
left join atlas-development-270609.finngen_omop_r11.observation_period
on (atlas-development-270609.finngen_omop_r11.person.person_id = atlas-development-270609.finngen_omop_r11.observation_period.person_id)
) q01 LIMIT 11;
the SQL seems fine, but BigQuery doesn't approve of having the full reference to the tables in the ON
statement. Maybe taht is what we should address?
Sorry, this is hard to make a reproducible example bcs is in bigquery
with options("DEBUG_DATABASECONNECTOR_DBPLYR" = TRUE)