OHDSI / DatabaseConnector

An R package for connecting to databases using JDBC.
http://ohdsi.github.io/DatabaseConnector/
54 stars 79 forks source link

insertTable for Sqlite adds decimal precision to data inserted into character field #280

Closed anthonysena closed 3 days ago

anthonysena commented 1 month ago

I'm not sure if this is an RMM issue or not but starting by putting it here. This issue is based on the work happening in https://github.com/OHDSI/Strategus/pull/147#issuecomment-2253288447.

Here is some code that can be used to reproduce the example between creating the table & uploading data in SQLite vs PostgreSQL. The files references in the code below is attached to the issue at the bottom. The issue is that when database_id is inserted into SQLite, it is formatted as 85642205.0 while when it is inserted into PostgreSQL, it is 85642205

# Create empty SQLite database -------------------------
library(RSQLite)
if (file.exists("D:/TEMP/temp.sqlite")) {
  unlink("D:/TEMP/temp.sqlite")
}
mydb <- dbConnect(RSQLite::SQLite(), "D:/TEMP/temp.sqlite")
dbDisconnect(mydb)

resultsConnectionDetails <- DatabaseConnector::createConnectionDetails(
  dbms = "sqlite",
  server = "D:/TEMP/temp.sqlite"
)

# Create the table
connection <- DatabaseConnector::connect(resultsConnectionDetails)

# Create the SQL from the resultsDataModelSpecification.csv
sql <- ResultModelManager::generateSqlSchema(
  csvFilepath = "D:/temp/StrategusR6Testing/results_folder/DatabaseMetaData/resultsDataModelSpecification.csv"
)
[database_meta_data.zip](https://github.com/user-attachments/files/16395694/database_meta_data.zip)
[database_meta_data.zip](https://github.com/user-attachments/files/16395708/database_meta_data.zip)

sql <- SqlRender::render(
  sql = sql,
  database_schema = "main"
)
DatabaseConnector::executeSql(connection = connection, sql = sql)

# NOTE: database_id is character
specification <- CohortGenerator::readCsv(file = "D:/temp/StrategusR6Testing/results_folder/DatabaseMetaData/resultsDataModelSpecification.csv")
ResultModelManager::uploadResults(
  connection = connection,
  schema = "main",
  resultsFolder = "D:/temp/StrategusR6Testing/results_folder/DatabaseMetaData",
  purgeSiteDataBeforeUploading = TRUE,
  databaseIdentifierFile = "D:/temp/StrategusR6Testing/results_folder/DatabaseMetaData/database_meta_data.csv",
  specifications = specification
)

#NOTE: database_id = 85642205.0
DatabaseConnector::disconnect(connection)

# Do the same but on PG
resultsConnectionDetails <- DatabaseConnector::createConnectionDetails(
  dbms = "postgresql",
  connectionString = keyring::key_get("resultsConnectionString", keyring = "ohda"),
  user = keyring::key_get("resultsAdmin", keyring = "ohda"),
  password = keyring::key_get("resultsAdminPassword", keyring = "ohda")
)

# Create the table
connection <- DatabaseConnector::connect(resultsConnectionDetails)

# Create the SQL from the resultsDataModelSpecification.csv
sql <- ResultModelManager::generateSqlSchema(
  csvFilepath = "D:/temp/StrategusR6Testing/results_folder/DatabaseMetaData/resultsDataModelSpecification.csv"
)
sql <- SqlRender::render(
  sql = sql,
  database_schema = "sena_test"
)
DatabaseConnector::executeSql(connection = connection, sql = sql)

# NOTE: database_id is character
specification <- CohortGenerator::readCsv(file = "D:/temp/StrategusR6Testing/results_folder/DatabaseMetaData/resultsDataModelSpecification.csv")
ResultModelManager::uploadResults(
  connection = connection,
  schema = "sena_test",
  resultsFolder = "D:/temp/StrategusR6Testing/results_folder/DatabaseMetaData",
  purgeSiteDataBeforeUploading = TRUE,
  databaseIdentifierFile = "D:/temp/StrategusR6Testing/results_folder/DatabaseMetaData/database_meta_data.csv",
  specifications = specification
)

#NOTE: database_id = 85642205
DatabaseConnector::disconnect(connection)

database_meta_data.zip

azimov commented 1 month ago

By default runCheckAndFixCommands is not executed for uploads. Setting this to TRUE will modify the data to being the expected type prior to insert (so it will be consistent).

Beyond this, the insert is completed with DatabaseConnector::insertTable and the particular dbms has no impact on how the data is read into memory so I suspect the bug is in the use of DBI vs jdbc drivers.

azimov commented 1 month ago

Tracing through live code in the example you sent and adding a debug(DatabaseConnector:::insertTable.DatabaseConnectorDbiConnection)

I confirm the issue is with the DBI::dbWriteTable

https://github.com/OHDSI/DatabaseConnector/blob/bee58a314996123b59e08bd489bf712e14391c4f/R/InsertTable.R#L463

Which is adding the decimal precision internally - likely inside RSQLite https://github.com/r-dbi/DBI/blob/main/R/dbWriteTable_DBIConnection_Id_ANY.R

However, reproducing this example is trickier than I'd like.

Creating a reproducible example is challenging but the only way to resolve this in this package would be to force the type conversions after reading the csv. However, this feels like a hack.

azimov commented 1 month ago

The following is a reproducible example that uses only DatabaseConnector so I will move the issue there:

library(DatabaseConnector)

unlink("test.sqlite")

sqliteConnectionDetails <- createConnectionDetails(
  dbms = "sqlite",
  server = "test.sqlite"
)

sqliteConnection <- connect(sqliteConnectionDetails)

sql <- "
CREATE TABLE @schema.test_table (
  database_id VARCHAR primary key
);
"

renderTranslateExecuteSql(sqliteConnection, sql, schema = "main")

data <- data.frame(
  database_id = 12345689878
)

insertTable(connection = sqliteConnection,
            data = data,
            createTable = FALSE,
            dropTableIfExists = FALSE,

            tableName = "test_table", 
            databaseSchema = "main")

print(renderTranslateQuerySql(sqliteConnection, "SELECT database_id FROM test_table"))

# Prints
#  DATABASE_ID
# 12345689878.0

disconnect(sqliteConnection)
schuemie commented 1 month ago

Would it be possible to convert the numeric value to a character string in R, before sending it to insertTable()? That way we're not dependent on how each database driver handles this.

azimov commented 4 weeks ago

Would it be possible to convert the numeric value to a character string in R, before sending it to insertTable()? That way we're not dependent on how each database driver handles this.

This will have a workaround in the next ResultModelManager release that fixes the types. However, the inconsistency between JDBC and DBI/Sqlite drivers could create issues in the future given that this will not only impact result sets.

Perhaps the data types of the table could be read prior to insert and then this could be cast in R accordingly?

azimov commented 2 weeks ago

This appears to be creating problems downstream when sqlite is used see https://github.com/OHDSI/Eunomia/issues/65

schuemie commented 1 week ago

I really think consistently casting numerics to string is out of scope for DatabaseConnector. This should be handled prior to handing it over to DatabaseConnector.

If not, then we'd need to parse every SQL statement to see if it contains an INSERT, then try to derive where it is inserting to, get the schema of that, and perform the correct casting.

azimov commented 6 days ago

I would argue that the conversion of data to use decimal precision is occurring after the user passes the data to insertTable. In every other driver case, the insert does not add the decimal value. Perhaps a more consistent approach would be to use the jdbc driver?

schuemie commented 3 days ago

Sorry, this is incredibly easy to fix (and not unreasonable to ask) before talking to DatabaseConnector, and really hard to fix with DatabaseConnector. I'm going to declare this a 'won't fix'.