epiverse-trace / readepi

An R package for importing epi data into R.
https://epiverse-trace.github.io/readepi/
Other
6 stars 5 forks source link

# Should {readepi} be {readapi} ?? #70

Open bahadzie opened 7 months ago

bahadzie commented 7 months ago

readepi provides functions for importing epidemiological data into R from common health information systems. README.Rmd - Thibaut Jombart commit 17 months ago.

tldr

More...

Ideas from DBI

In addition the following are nice to have in R

The Interface

A simple interface for getting the data. The general consensus is around the following pseudo-code

datasource <- readepi(
  connect, # required to establish connection
  filter, # rows
  select # columns
)
# Up to this point, no API request has been made. The previous steps of the pipeline serve to create the request we will make now
data <- fetch_data(datasource)

# OR 

data <- connect_api(credentials) |>
  select() |>
  filter() |>
  # Up to this point, no API request has been made. The previous steps of the pipeline serve to create the request we will make now
  fetch_data()

Discussion

A case was made for removing functionality related to importing data from files because readepi was only providing a thin wrapper around {rio}.

The same case can be made for removing functionality related to importing data from relational databases because readepi is a wrapper around DBI.

After a quick review of DBI, my initial proposal was to implement readepi as a backend to DBI. Upon investigation only 7 of the 45 functions defined in the DBI spec are relevant to the concept of readepi. DBI is a spec tailored mainly to accessing data in SQL based relational databases. Its interface abstraction leaks leaks some of this detail.

The semantics of web APIs are not the same as SQL semantics. Ideally any interface should be implemented in the context of the R ecosystem and made to feel and behave as R objects are expected to behave. This is where I believe readepi/readpapi should be positioned.

Appendix

A. readepi interface over time

NAMESPACE read_from_file, read_from_ms_sql_server, read_from_redcap, read_stuff, readepi

NAMESPACE readCredentials, read_from_file, read_from_ms_sql_server, read_from_redcap, read_stuff, readepi, show_example_file

NAMESPACE readCredentials, read_from_file, read_from_ms_sql_server, read_from_redcap, read_stuff, readepi, showTables, show_example_file, subsetFields, subsetRecords

NAMESPACE clade_assignment, genome_assembly, readCredentials, read_from_dhis2, read_from_file, read_from_ms_sql_server, read_from_redcap, read_stuff, readepi, showTables, show_example_file, subsetFields, subsetRecords

NAMESPACE clade_assignment, genome_assembly, readCredentials, read_from_file, read_from_ms_sql_server, read_from_redcap, read_stuff, readepi, showTables, show_example_file, subsetFields, subsetRecords

NAMESPACE clade_assignment, genome_assembly, readCredentials, read_from_dhis2, read_from_file, read_from_ms_sql_server, read_from_redcap, read_stuff, readepi, showTables, show_example_file, subsetFields, subsetRecords

NAMESPACE getExtension, read_credentials, read_from_dhis2, read_from_file, read_from_ms_sql_server, read_from_redcap, readepi, show_example_file, show_tables, subset_fields, subset_records

NAMESPACE getExtension, install_odbc_driver, read_credentials, read_from_dhis2, read_from_file, read_from_ms_sql_server, read_from_redcap, readepi, show_example_file, show_tables, subset_fields, subset_records

NAMESPACE check_dhis2_attributes, getExtension, install_odbc_driver, read_credentials, read_from_dhis2, read_from_file, read_from_ms_sql_server, read_from_redcap, readepi, show_example_file, show_tables, subset_fields, subset_records

NAMESPACE check_dhis2_attributes, getExtension, get_data_element_groups, get_data_elements, get_data_sets, get_organisation_units, install_odbc_driver, read_credentials, read_from_dhis2, read_from_file, read_from_ms_sql_server, read_from_redcap, readepi, show_example_file, show_tables, subset_fields, subset_records

NAMESPACE check_dhis2_attributes, getExtension, get_data_elements, get_data_sets, get_fingertips_metadata, get_indicatorID_from_domainID, get_indicatorID_from_domainName, get_indicatorID_from_indicatorName, get_indicatorID_from_profile, get_organisation_units, install_odbc_driver, read_credentials, read_from_dhis2, read_from_file, read_from_fingertips, read_from_ms_sql_server, read_from_redcap, readepi, show_example_file, show_tables, subset_fields, subset_records

NAMESPACE check_dhis2_attributes, getExtension, get_data_elements, get_data_sets, get_fingertips_metadata, get_indicatorID_from_domainID, get_indicatorID_from_domainName, get_indicatorID_from_indicatorName, get_indicatorID_from_profile, get_organisation_units, read_credentials, read_from_dhis2, read_from_file, read_from_fingertips, read_from_ms_sql_server, read_from_redcap, readepi, show_example_file, show_tables, subset_fields, subset_records

NAMESPACE check_dhis2_attributes, getExtension, get_data_elements, get_data_sets, get_fingertips_metadata, get_indicatorID_from_domainID, get_indicatorID_from_domainName, get_indicatorID_from_indicatorName, get_indicatorID_from_profile, get_organisation_units, read_credentials, read_from_dhis2, read_from_file, read_from_fingertips, read_from_ms_sql_server, read_from_redcap, read_multiple_files, readepi, show_example_file, show_tables, subset_fields, subset_records

NAMESPACE check_dhis2_attributes, get_data_elements, get_data_sets, get_extension, get_fingertips_metadata, get_ind_id_from_domain_id, get_ind_id_from_domain_name, get_ind_id_from_ind_name, get_ind_id_from_profile, get_organisation_units, read_credentials, read_from_dhis2, read_from_file, read_from_fingertips, read_from_ms_sql_server, read_from_redcap, read_multiple_files, readepi, show_example_file, show_tables, subset_fields, subset_records

NAMESPACE check_dhis2_attributes, connect_to_server, detect_separator, fetch_data_from_query, get_data_elements, get_data_sets, get_extension, get_fingertips_metadata, get_ind_id_from_domain_id, get_ind_id_from_domain_name, get_ind_id_from_ind_name, get_ind_id_from_profile, get_organisation_units, identify_table_name, read_credentials, read_from_dhis2, read_from_file, read_from_fingertips, read_from_redcap, read_multiple_files, readepi, show_example_file, show_tables, sql_select_data, sql_select_entire_dataset, sql_select_fields_only, sql_select_records_and_fields, sql_select_records_only, sql_server_read_data, visualise_table

NAMESPACE check_dhis2_attributes, connect_to_server, detect_separator, dhis2_subset_fields, dhis2_subset_records, fetch_data_from_query, get_base_name, get_data_elements, get_data_sets, get_extension, get_fingertips_metadata, get_ind_id_from_domain_id, get_ind_id_from_domain_name, get_ind_id_from_ind_name, get_ind_id_from_profile, get_organisation_units, get_read_file_args, get_read_fingertips_args, get_relevant_data_elt_group, get_relevant_dataset, get_relevant_organisation_unit, identify_table_name, import_redcap_data, read_credentials, read_from_dhis2, read_from_file, read_from_fingertips, read_from_redcap, read_multiple_files, readepi, redcap_get_results, redcap_read, redcap_read_fields, redcap_read_records, redcap_read_rows_columns, show_example_file, show_tables, sql_select_data, sql_select_entire_dataset, sql_select_fields_only, sql_select_records_and_fields, sql_select_records_only, sql_server_read_data, visualise_table

NAMESPACE check_dhis2_attributes, connect_to_server, detect_separator, dhis2_subset_fields, dhis2_subset_records, fetch_data_from_query, fingertips_subset_columns, fingertips_subset_rows, get_base_name, get_data_elements, get_data_sets, get_extension, get_fingertips_metadata, get_ind_id_from_domain_id, get_ind_id_from_domain_name, get_ind_id_from_ind_name, get_ind_id_from_profile, get_organisation_units, get_profile_name, get_read_file_args, get_read_fingertips_args, get_relevant_data_elt_group, get_relevant_dataset, get_relevant_organisation_unit, identify_table_name, import_redcap_data, read_credentials, read_from_dhis2, read_from_file, read_from_fingertips, read_from_redcap, read_multiple_files, readepi, redcap_get_results, redcap_read, redcap_read_fields, redcap_read_records, redcap_read_rows_columns, show_example_file, show_tables, sql_select_data, sql_select_entire_dataset, sql_select_fields_only, sql_select_records_and_fields, sql_select_records_only, sql_server_read_data, visualise_table

NAMESPACE get_dhis2_attributes, readepi, show_example_file, show_tables,

B. DBI Reference Functions

*functions that are relevant to {readepi}

Connecting and disconnecting

*dbConnect(drv, ...) - Create a connection to a DBMS

*dbGetInfo(dbObj, ...) - Get DBMS metadata

dbDisconnect(conn, ...) - Disconnect (close) a connection

dbCanConnect(drv, ...) - Check if a connection to a DBMS can be established

dbIsValid(dbObj, ...) - Is this DBMS object still valid?

dbIsReadOnly(dbObj, ...) - Is this DBMS object read only?

dbGetConnectArgs(drv, eval = TRUE, ...) - Get connection arguments

Tables

*dbReadTable(conn, name, ...) - Read database tables as data frames

*dbListTables(conn, ...) - List remote tables

*dbListFields(conn, name, ...) - List field names of a remote table

*dbExistsTable(conn, name, ...) - Does a table exist?

*dbListObjects(conn, prefix = NULL, ...) - List remote objects

dbWriteTable(conn, name, value, ...) - Copy data frames to database tables

dbCreateTable(conn, name, fields, ..., row.names = NULL, temporary = FALSE) - Create a table in the database

dbAppendTable(conn, name, value, ..., row.names = NULL) - Insert rows into a table

dbRemoveTable(conn, name, ...) Remove a table from the database

sqlRownamesToColumn(df, row.names = NA) / sqlColumnToRownames(df, row.names = NA) - Convert row names back and forth between columns

Queries and statements

dbGetQuery(conn, statement, ...) - Retrieve results from a query

dbExecute(conn, statement, ...) - Change database state

Results

dbSendQuery(conn, statement, ...) - Execute a query on a given database connection

dbSendStatement(conn, statement, ...) - Execute a data manipulation statement on a given database connection

dbBind(res, params, ...) / dbBindArrow(res, params, ...) - Bind values to a parameterized/prepared statement

dbFetch(res, n = -1, ...) / fetch(res, n = -1, ...) - Fetch records from a previously executed query

dbGetRowCount(res, ...) - The number of rows fetched so far

dbGetRowsAffected(res, ...) - The number of rows affected

dbGetStatement(res, ...) - Get the statement associated with a result set

dbHasCompleted(res, ...) - Completion status

dbColumnInfo(res, ...) - Information about result types

dbClearResult(res, ...) -Clear a result set

Transactions

dbBegin(conn, ...) / dbCommit(conn, ...) / dbRollback(conn, ...)- Begin/commit/rollback SQL transactions

dbWithTransaction(conn, code, ...) / dbBreak() - Self-contained SQL transactions

SQL

SQL(x, ..., names = NULL) - SQL quoting

dbDataType(dbObj, obj, ...) - Determine the SQL data type of an object

dbQuoteIdentifier(conn, x, ...) - Quote identifiers

dbQuoteLiteral(conn, x, ...) - Quote literal values

dbQuoteString(conn, x, ...) - Quote literal strings

sqlCreateTable(con, table, fields, row.names = NA, temporary = FALSE, ...) - Compose query to create a simple table

sqlAppendTable(con, table, values, row.names = NA, ...) - Compose query to insert rows into a table

sqlData(con, value, row.names = NA, ...) - Convert a data frame into form suitable for upload to an SQL database

dbUnquoteIdentifier(conn, x, ...) - Unquote identifiers

sqlInterpolate(conn, sql, ..., .dots = list()) - Safely interpolate values into an SQL string

Classes

DBIObject-class

DBIDriver-class

DBIConnection-class

DBIResult-class

DBIConnector-class

chartgerink commented 7 months ago

Hey @bahadzie - thanks for this extensive thought process! I am still getting familiar with everything, so please do tell me if I get things wrong. As a newbie here, I thought readepi's goal was to harmonize reading in data from various sources. That way, I would only need to know how to use readepi regardless of whether I'm reading in data from an API or a relational database.

Did you get feedback from people that they were confused by readepi's goal or preferred a different definition? Or is this from your (+your team's) assessment of doing the work creating the package?

On the topic of scope reduction: My experience with APIs is that it's very hard to know what you'll be getting back, if there is no known standard being used. There might be only a handful of sources or standards in epidemiology (I'm unaware of the landscape), which you are better in assessing. So my question for a redefinition to focus on APIs would be: Are the APIs we would want to read known or unknown in advance?

A generic readapi would of course still be interesting, but does that then still fit the intended purpose with which readepi was started?