cynkra / dm

Working with relational data models in R
https://dm.cynkra.com
Other
501 stars 49 forks source link

`dm_wrap()` doesn't behave as documented when constraints are violated #761

Open moodymudskipper opened 2 years ago

moodymudskipper commented 2 years ago
library(dm, warn.conflicts = FALSE)
dm <- dm_nycflights13()
dm %>%
  dm_wrap(flights) %>%
  dm_unwrap(ptype = dm) %>%
  pull_tbl(planes)
#> # A tibble: 1,112 × 9
#>    tailnum  year type         manufacturer  model    engines seats speed engine 
#>    <chr>   <int> <chr>        <chr>         <chr>      <int> <int> <int> <chr>  
#>  1 N571JB   2003 Fixed wing … AIRBUS        A320-232       2   200    NA Turbo-…
#>  2 N564JB   2003 Fixed wing … AIRBUS        A320-232       2   200    NA Turbo-…
#>  3 N171US   2001 Fixed wing … AIRBUS INDUS… A321-211       2   199    NA Turbo-…
#>  4 N35204   2000 Fixed wing … BOEING        737-824        2   149    NA Turbo-…
#>  5 N815UA   1998 Fixed wing … AIRBUS INDUS… A319-131       2   179    NA Turbo-…
#>  6 N5EAAA     NA <NA>         <NA>          <NA>          NA    NA    NA <NA>   
#>  7 N784JB   2011 Fixed wing … AIRBUS        A320-232       2   200    NA Turbo-…
#>  8 N337JB   2011 Fixed wing … EMBRAER       ERJ 190…       2    20    NA Turbo-…
#>  9 N19554   2002 Fixed wing … EMBRAER       EMB-145…       2    55    NA Turbo-…
#> 10 N740UW   2000 Fixed wing … AIRBUS INDUS… A319-112       2   179    NA Turbo-…
#> # … with 1,102 more rows

We see rows of NAs are created for primary keys that didn't exist in the original planes table

The doc says

Even if all referential constraints are satisfied, unwrapping after wrapping loses rows in parent tables that don't have a corresponding row in the child table.

Should we mention that we can also gain rows if all referential constraints are not satisfied ?

Should we inform the user when wrapping that such thing might happen when unwrapping ?

We could also fail early as some constraints are violated and this might not be the kind of operations we want to do on a dm if it's not tight ?

moodymudskipper commented 2 years ago

A related issue, is that among the pks that we add to the planes table above there is NA, because flights$tailnum can be NA.

If we decide to accept the above (since it somehow "repairs" the dm), we might want to make an exception for NA and remove those rows ?

krlmlr commented 2 years ago

I think we should define precise roundtrip conditions in dm_nest_tbl() and dm_pack_tbl(), and refer to those definitions from dm_wrap() . It is okay to be terse in the help pages for the functions, we also want a longer-form article that shows how these functions are applied (based on flights, financial, and perhaps starwars or a real API response).

moodymudskipper commented 2 years ago

What is the scope of this issue ?