cynkra / dm

Working with relational data models in R
https://dm.cynkra.com
Other
494 stars 50 forks source link

Macro: Rekey #519

Open krlmlr opened 3 years ago

krlmlr commented 3 years ago

Logic: define a new primary key column that trickles down to the foreign tables, optionally remove existing (primary and foreign) key columns.

Name: dm_update_pk()?

Building block of #502.

krlmlr commented 3 years ago

We also need dm_update_fk() when we allow foreign keys to non-primary keys.

moodymudskipper commented 2 years ago

We might have something like this to start with :

dm <-
  dm(players = 
     data.frame(player = c("John", "Tom", "Max")),
   matches =
     data.frame(
       player1 = c("John", "Tom"),
       player2 = c("Tom", "Max")
     ),
   events =
     data.frame(
       event = c("Brussels", "Brussels", "Zurich"),
       event_player = c("John", "Tom", "John")
     )) %>% 
  dm_add_pk(players, player) %>% 
  dm_add_fk(matches, player1, players) %>% 
  dm_add_fk(matches, player2, players) %>% 
  dm_add_fk(events, event_player, players) 

dm_draw(dm, view_type = "all")

Created on 2021-09-22 by the reprex package (v2.0.1)

And say we wish to end up with the following :

dm_out <-
  dm(players = 
       data.frame(
         id = 1:3,
         player = c("John", "Tom", "Max")),
     matches =
       data.frame(
         player1 = c("John", "Tom"),
         player2 = c("Tom", "Max"),
         player1_id = 1:2,
         player2_id = 2:3
       ),
     events =
       data.frame(
         event = c("Brussels", "Brussels", "Zurich"),
         event_player = c("John", "Tom", "John"),
         player_id = c(1,2,3)
       )) %>% 
  dm_add_pk(players, id) %>% 
  dm_add_fk(matches, player1_id, players) %>% 
  dm_add_fk(matches, player2_id, players) %>% 
  dm_add_fk(events, player_id, players) 

dm_draw(dm_out, view_type = "all")

Created on 2021-09-22 by the reprex package (v2.0.1)

Manual renaming might work like this :

# dm_update_pk(dm, tbl, ..., col_name = "id", rm_old_cols = c("none", "all", "fk_only")) 
dm_update_pk(
  dm, players, 
  players = c(player1_id = "player1", player2_id = "player2"), 
  events = c(event_player = "player_id")

It might be semi manual, so if there is only one fk in the table the new fk in foreign tables will be named based on the name of the primary table.

# dm_update_pk(dm, tbl, ..., col_name = "id", default_pk_name = NULL, rm_old_cols = c("none", "all", "fk_only"))
# by default  `default_pk_name` might be be `PrimaryTableName_id`
dm_update_pk(
  dm, players, 
  players = c(player1_id = "player1", player2_id = "player2"))

It might be simpler to provide a suffix to be applied on existing table names though, we'd check that it works everywhere, fail if it doesn't. The user would be free to rename whatever doesn't suit (either before applying dm_update_pk, so it doesn't fail, or after applying it, to tweak the given names).

It might work like this :

# dm_update_pk(dm, tbl, col_name = "id", suffix = "_id", rm_old_cols = c("none", "all", "fk_only"))
dm_update_pk(dm, players)

In this case, events.event_player will be renamed to event_player_id, so we might want to rename it manually to player_id but all the rest worked automatically.

krlmlr commented 2 years ago

For the column names, can we use a glue pattern similar to e.g. the .name_spec argument to vctrs::vec_c() ? We'd fail if the pattern leads to duplicate column names. We could also provide placeholders that expand to the child column name only if there's more than one foreign key in a table.

How do we deal with compound keys?

The macro should not be responsible for creating the new id column in the parent table, it must already exist.