Hi -- first up, thank you so much for this package! It's a great idea with lots of tricky details that I think you've dealt with really well.
I noticed that json_types() silently overwrites existing any existing column named type, unlike gather_object() which adds an increment to the default column name name and warns the user:
library(tidyjson)
#>
#> Attaching package: 'tidyjson'
#> The following object is masked from 'package:stats':
#>
#> filter
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
## create sample json array
wb_1 <- worldbank[1] |>
json_structure() |>
filter(level == 1 & type == "array")
## type column: array
wb_1 |>
gather_array()
#> # A tbl_json: 4 x 11 tibble with a "JSON" attribute
#> ..JSON document.id parent.id level index child.id seq name type length
#> <chr> <int> <chr> <int> <int> <chr> <list> <chr> <fct> <int>
#> 1 "{\"Name… 1 1 1 5 1.5 <list> majo… array 4
#> 2 "{\"Name… 1 1 1 5 1.5 <list> majo… array 4
#> 3 "{\"Name… 1 1 1 5 1.5 <list> majo… array 4
#> 4 "{\"Name… 1 1 1 5 1.5 <list> majo… array 4
#> # ℹ 1 more variable: array.index <int>
## existing `type` column is silently overwritten
## to reflect types of elements in array (i.e. each row)
wb_1 |>
gather_array() |>
json_types()
#> # A tbl_json: 4 x 11 tibble with a "JSON" attribute
#> ..JSON document.id parent.id level index child.id seq name type length
#> <chr> <int> <chr> <int> <int> <chr> <list> <chr> <fct> <int>
#> 1 "{\"Name… 1 1 1 5 1.5 <list> majo… obje… 4
#> 2 "{\"Name… 1 1 1 5 1.5 <list> majo… obje… 4
#> 3 "{\"Name… 1 1 1 5 1.5 <list> majo… obje… 4
#> 4 "{\"Name… 1 1 1 5 1.5 <list> majo… obje… 4
#> # ℹ 1 more variable: array.index <int>
## but this is inconsistent with
## gather_object() which adds a new `name` column
## and warns the user:
# Warning message:
#In gather_object(json_types(gather_array(wb_1))) :
# name column name already exists, changing to name.2
wb_1 |>
gather_array() |>
json_types() |>
gather_object()
#> Warning in gather_object(json_types(gather_array(wb_1))): name column name
#> already exists, changing to name.2
#> # A tbl_json: 8 x 12 tibble with a "JSON" attribute
#> ..JSON document.id parent.id level index child.id seq name type length
#> <chr> <int> <chr> <int> <int> <chr> <list> <chr> <fct> <int>
#> 1 "\"Educa… 1 1 1 5 1.5 <list> majo… obje… 4
#> 2 "46" 1 1 1 5 1.5 <list> majo… obje… 4
#> 3 "\"Educa… 1 1 1 5 1.5 <list> majo… obje… 4
#> 4 "26" 1 1 1 5 1.5 <list> majo… obje… 4
#> 5 "\"Publi… 1 1 1 5 1.5 <list> majo… obje… 4
#> 6 "16" 1 1 1 5 1.5 <list> majo… obje… 4
#> 7 "\"Educa… 1 1 1 5 1.5 <list> majo… obje… 4
#> 8 "12" 1 1 1 5 1.5 <list> majo… obje… 4
#> # ℹ 2 more variables: array.index <int>, name.2 <chr>
Would it be possible to modify json_types() to behave consistently with gather_object() -- i.e. to NOT overwrite the existing type column, but instead append a new column type.2 if json_types() is called multiple times?
Hi -- first up, thank you so much for this package! It's a great idea with lots of tricky details that I think you've dealt with really well.
I noticed that
json_types()
silently overwrites existing any existing column namedtype
, unlikegather_object()
which adds an increment to the default column namename
and warns the user:Created on 2024-10-08 with reprex v2.0.2
Would it be possible to modify
json_types()
to behave consistently withgather_object()
-- i.e. to NOT overwrite the existingtype
column, but instead append a new columntype.2
ifjson_types()
is called multiple times?