cynkra / dm

Working with relational data models in R
https://dm.cynkra.com
Other
500 stars 50 forks source link

dm_from_con() with MariaDB #1128

Closed maelle closed 2 years ago

maelle commented 2 years ago

I get different table names locally for https://cynkra.github.io/dm/dev/articles/dm.html#creating-a-dm-object

(from PR #1123)

#> ── Table source ───────────────────────────────────────────────────────────
#> src:  mysql  [guest@relational.fit.cvut.cz:NA/Financial_ijs]
#> ── Metadata ───────────────────────────────────────────────────────────────
#> Tables: `20news`, `2news`, `abbrev`, `acc`, `ACCOUNT_TRANSACT_TYPES`, … (2231 total)
#> Columns: 19490
#> Primary keys: 395
#> Foreign keys: 407
maelle commented 2 years ago

@krlmlr I wonder whether this is a new bug introduced by the code changes?

CI https://github.com/cynkra/dm/runs/6963247567?check_suite_focus=true

It seems the example uses either https://relational.fit.cvut.cz/dataset/Financial or a fallback, and until yesterday it was maybe using the fallback?

krlmlr commented 2 years ago

We're just seeing many more tables now.

krlmlr commented 2 years ago

Due to code changes.

krlmlr commented 2 years ago

We can now use the schema argument to limit that, and we should.

krlmlr commented 2 years ago

(Maybe?)

krlmlr commented 2 years ago

Oh: relational.fit was down over the weekend, there is a fallback by @pachadotdev that only has this one table. Could that be the reason?

krlmlr commented 2 years ago

Either way, schema = seems worth a shot.

maelle commented 2 years ago

Yes I think that over the weekend you were using the fallback.

maelle commented 2 years ago

Who should add the use of the schema argument in the vignette?

krlmlr commented 2 years ago

Can you please rephrase?

maelle commented 2 years ago

If the fix is this vignette tweak (using the schema argument), should you or I tweak the vignette? :-)

Related question: should that argument be different based on whether the example is the main one or the fallback one?

krlmlr commented 2 years ago

Please go ahead, I'm not touching this today or tomorrow.

Can we make the examples work the same for both main and fallback?

maelle commented 2 years ago

Noting that https://relational.fit.cvut.cz/dataset/Financial does not mention 2231 tables so something is wrong (investigating)

maelle commented 2 years ago

it's as if all table variables became one table

maelle commented 2 years ago

or actually am I getting all tables from all databases of the provider?

Example with the fallback database

library(RMariaDB)
library(dm)
#> 
#> Attachement du package : 'dm'
#> L'objet suivant est masqué depuis 'package:stats':
#> 
#>     filter

fin_db <- dm:::dbedu_con()

fin_dm <- dm_from_con(fin_db)
#> Keys queried successfully, use `learn_keys = TRUE` to mute this message.

names(fin_dm)
#>   [1] "accounts"                             
#>   [2] "ALL_PLUGINS"                          
#>   [3] "APPLICABLE_ROLES"                     
#>   [4] "cards"                                
#>   [5] "CHARACTER_SETS"                       
#>   [6] "CHECK_CONSTRAINTS"                    
#>   [7] "CLIENT_STATISTICS"                    
#>   [8] "clients"                              
#>   [9] "COLLATION_CHARACTER_SET_APPLICABILITY"
#>  [10] "COLLATIONS"                           
#>  [11] "COLUMN_PRIVILEGES"                    
#>  [12] "COLUMNS"                              
#>  [13] "disps"                                
#>  [14] "districts"                            
#>  [15] "ENABLED_ROLES"                        
#>  [16] "ENGINES"                              
#>  [17] "EVENTS"                               
#>  [18] "FILES"                                
#>  [19] "GEOMETRY_COLUMNS"                     
#>  [20] "GLOBAL_STATUS"                        
#>  [21] "GLOBAL_VARIABLES"                     
#>  [22] "INDEX_STATISTICS"                     
#>  [23] "INNODB_BUFFER_PAGE"                   
#>  [24] "INNODB_BUFFER_PAGE_LRU"               
#>  [25] "INNODB_BUFFER_POOL_STATS"             
#>  [26] "INNODB_CMP"                           
#>  [27] "INNODB_CMP_PER_INDEX"                 
#>  [28] "INNODB_CMP_PER_INDEX_RESET"           
#>  [29] "INNODB_CMP_RESET"                     
#>  [30] "INNODB_CMPMEM"                        
#>  [31] "INNODB_CMPMEM_RESET"                  
#>  [32] "INNODB_FT_BEING_DELETED"              
#>  [33] "INNODB_FT_CONFIG"                     
#>  [34] "INNODB_FT_DEFAULT_STOPWORD"           
#>  [35] "INNODB_FT_DELETED"                    
#>  [36] "INNODB_FT_INDEX_CACHE"                
#>  [37] "INNODB_FT_INDEX_TABLE"                
#>  [38] "INNODB_LOCK_WAITS"                    
#>  [39] "INNODB_LOCKS"                         
#>  [40] "INNODB_METRICS"                       
#>  [41] "INNODB_MUTEXES"                       
#>  [42] "INNODB_SYS_COLUMNS"                   
#>  [43] "INNODB_SYS_DATAFILES"                 
#>  [44] "INNODB_SYS_FIELDS"                    
#>  [45] "INNODB_SYS_FOREIGN"                   
#>  [46] "INNODB_SYS_FOREIGN_COLS"              
#>  [47] "INNODB_SYS_INDEXES"                   
#>  [48] "INNODB_SYS_SEMAPHORE_WAITS"           
#>  [49] "INNODB_SYS_TABLES"                    
#>  [50] "INNODB_SYS_TABLESPACES"               
#>  [51] "INNODB_SYS_TABLESTATS"                
#>  [52] "INNODB_SYS_VIRTUAL"                   
#>  [53] "INNODB_TABLESPACES_ENCRYPTION"        
#>  [54] "INNODB_TABLESPACES_SCRUBBING"         
#>  [55] "INNODB_TRX"                           
#>  [56] "KEY_CACHES"                           
#>  [57] "KEY_COLUMN_USAGE"                     
#>  [58] "KEYWORDS"                             
#>  [59] "loans"                                
#>  [60] "orders"                               
#>  [61] "PARAMETERS"                           
#>  [62] "PARTITIONS"                           
#>  [63] "PLUGINS"                              
#>  [64] "PROCESSLIST"                          
#>  [65] "PROFILING"                            
#>  [66] "REFERENTIAL_CONSTRAINTS"              
#>  [67] "ROUTINES"                             
#>  [68] "SCHEMA_PRIVILEGES"                    
#>  [69] "SCHEMATA"                             
#>  [70] "SESSION_STATUS"                       
#>  [71] "SESSION_VARIABLES"                    
#>  [72] "sj_all_revenue_large"                 
#>  [73] "sj_all_revenue_large_f"               
#>  [74] "sj_all_revenue_medium"                
#>  [75] "sj_all_revenue_medium_f"              
#>  [76] "sj_all_revenue_small"                 
#>  [77] "sj_all_revenue_small_f"               
#>  [78] "sj_all_revenue_xlarge"                
#>  [79] "sj_all_revenue_xlarge_f"              
#>  [80] "sj_all_sessions_large"                
#>  [81] "sj_all_sessions_large_f"              
#>  [82] "sj_all_sessions_medium"               
#>  [83] "sj_all_sessions_medium_f"             
#>  [84] "sj_all_sessions_small"                
#>  [85] "sj_all_sessions_small_f"              
#>  [86] "sj_all_sessions_xlarge"               
#>  [87] "sj_all_sessions_xlarge_f"             
#>  [88] "sj_user_summary_large"                
#>  [89] "sj_user_summary_large_f"              
#>  [90] "sj_user_summary_medium"               
#>  [91] "sj_user_summary_medium_f"             
#>  [92] "sj_user_summary_small"                
#>  [93] "sj_user_summary_small_f"              
#>  [94] "sj_user_summary_xlarge"               
#>  [95] "sj_user_summary_xlarge_f"             
#>  [96] "sj_users_daily_large"                 
#>  [97] "sj_users_daily_large_f"               
#>  [98] "sj_users_daily_medium"                
#>  [99] "sj_users_daily_medium_f"              
#> [100] "sj_users_daily_small"                 
#> [101] "sj_users_daily_small_f"               
#> [102] "sj_users_daily_xlarge"                
#> [103] "sj_users_daily_xlarge_f"              
#> [104] "SPATIAL_REF_SYS"                      
#> [105] "SQL_FUNCTIONS"                        
#> [106] "STATISTICS"                           
#> [107] "SYSTEM_VARIABLES"                     
#> [108] "TABLE_CONSTRAINTS"                    
#> [109] "TABLE_PRIVILEGES"                     
#> [110] "TABLE_STATISTICS"                     
#> [111] "TABLES"                               
#> [112] "TABLESPACES"                          
#> [113] "tkeys"                                
#> [114] "trans"                                
#> [115] "TRIGGERS"                             
#> [116] "USER_PRIVILEGES"                      
#> [117] "USER_STATISTICS"                      
#> [118] "user_variables"                       
#> [119] "VIEWS"

Created on 2022-06-20 by the reprex package (v2.0.1)

maelle commented 2 years ago

Noting this does not happen with the dm version that's on main. I'm going to look into the PR more closely.

maelle commented 2 years ago

@krlmlr do you agree there's a bug here? :thinking:

maelle commented 2 years ago

Currently looking for a Postgres example to see if that's affected too.

pachadotdev commented 2 years ago

Dear @maelle

I have an example of the exact same DB (Financial) also on PostgreSQL and SQL Server. Please have a look at https://databases.pacha.dev/#financial-aka-loan-application.

Best, Mauricio Vargas S.

krlmlr commented 2 years ago

Thanks! I was referring to the schema argument of dm_from_con() . At some point we'll expose dm_meta() . It's worth making it consistent with ADBC (see https://github.com/apache/arrow-adbc/pull/18), guessing the schema name isn't something that we can or want to cover in this documentation until we expose dm_meta(). That's a bit of a catch-22, unfortunately.

library(RMariaDB)
library(dm)

fin_db <- dm:::dbedu_con()

meta <-
  fin_db %>%
  dm:::dm_meta()

meta$schemata
#> # Source:   SQL [3 x 2]
#> # Database: mysql  [student@databases.pacha.dev:NA/financial]
#>   catalog_name schema_name       
#>   <chr>        <chr>             
#> 1 def          information_schema
#> 2 def          intendo           
#> 3 def          financial

fin_dm <- dm_from_con(fin_db, learn_keys = TRUE, schema = "financial")

names(fin_dm)
#> [1] "accounts"  "cards"     "clients"   "disps"     "districts" "loans"    
#> [7] "orders"    "tkeys"     "trans"

fin_db_relational <- dm:::relational_con()

fin_dm_relational <- dm_from_con(fin_db_relational, learn_keys = TRUE, schema = "financial")

names(fin_dm)
#> [1] "accounts"  "cards"     "clients"   "disps"     "districts" "loans"    
#> [7] "orders"    "tkeys"     "trans"

Created on 2022-06-21 by the reprex package (v2.0.1)

It is true that dm_from_con() now requires the schema argument for RMariaDB. There, "schema" and "database" are synonymous, and filtering by database in the connection could restrict the metadata returned. I'll take a look. Related: https://github.com/cynkra/dm/pull/1130.

krlmlr commented 2 years ago

I fixed #1123 so that if dbname is passed to dbConnect() we're only getting tables from that schema/database. Closing, let's discuss in new issues if needed.

github-actions[bot] commented 1 year ago

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.