Closed jeremystan closed 5 years ago
Also added a test to support the object case:
data_frame(id = 1, json = '{"a":1}') %>% as.tbl_json(json.column = "json") %>% json_structure
Resolved by imputing document.id in json_structure_init()
with row_number()
when document.id was not present. I thought this was a better solution than disregarding document.id
since it is advertised in the docs as a return column. It also allows output that identifies which record came from which row... although document.id
is left alone if the column exists on input, which can make for some non-intuitive results...
## these give different output in document.id
'[{"a":1},{"a":2}]' %>% gather_array() %>% json_structure()
'[{"a":1},{"a":2}]' %>% gather_array() %>% select(-document.id) %>% json_structure()
I'm wondering whether the implementation of json_structure()
could be improved at all, if its scope should be more narrowly defined (i.e. does it need to leave the tbl_json
as-is or return an object focused on structure?) or perhaps I am just struggling to understand its case for use.
One note - not sure if it is intentional that any tbl_json
structure already present is included in the output of json_structure()
, because json_structure_init()
does not use transmute()
? These values are only included on the parent object, though (i.e. see the id
field on above examples, or array.index
below)
## array.index field is preserved
'[{"a":1},{"a":2}]' %>% gather_array() %>% json_structure()
This works
But this does not