Open jeremystan opened 8 years ago
A similar-ish case that may be worth considering here is arrays that have been improperly serialized to an object when there is only one element. I.e. JSON like:
x <- '[{"id": 1, "list":[1,2,3]}, {"id": 2, "list": 4}]'
x %>% gather_array() %>%
spread_values(id=jnumber('id')) %>%
enter_object('list') %>%
json_types()
While technically not valid, it may still be nice to have a way to work with it. The work-around solution here is the same - filtering on type == 'array'
.
I also posted the workaround in an actual question someone had here
Honestly, it seems all that is really needed here is a way to bypass the type-checking. The function itself already handles these cases fairly nicely when the type-check is removed. Not sure whether the better behavior is a parameter in the function or an environmental variable like tidyjson.typesafety
or something like that.
By commenting out the type-checking lines in the gather_factory
:
x <- "[{\"id\": 1, \"list\":[1,2,3]}, {\"id\": 2, \"list\": 4}]"
x %>% gather_array() %>% enter_object("list") %>% json_types() %>%
gather_array("array.index2") %>%
json_types("type2")
#> # A tbl_json: 4 x 5 tibble with a "JSON" attribute
#> `attr(., "JSON")` document.id array.index type array.index2 type2
#> <chr> <int> <int> <fctr> <int> <fctr>
#> 1 1 1 1 array 1 number
#> 2 2 1 1 array 2 number
#> 3 3 1 1 array 3 number
#> 4 4 1 2 number 1 number
x <- "[[1, 2], 1]" %>% gather_array %>% json_types
x %>% gather_array("array.index2") %>% json_types("type2")
#> # A tbl_json: 3 x 5 tibble with a "JSON" attribute
#> `attr(., "JSON")` document.id array.index type array.index2 type2
#> <chr> <int> <int> <fctr> <int> <fctr>
#> 1 1 1 1 array 1 number
#> 2 2 1 1 array 2 number
#> 3 1 1 2 number 1 number
Although perhaps it would be preferable for the array.index2
to be NA
and thereby illustrate that it was not an array? Not sure which behavior is more consistent and desirable.
The change above is very problematic for objects, for which keys are silently thrown away, so a better proposal is required... maybe a way to not touch bad_types
and preserve them as NA
?
'{"a":"one","b":"two","c":"three"}' %>%
gather_array() %>%
append_values_string()
## A tbl_json: 3 x 3 tibble with a "JSON" attribute
# `attr(., "JSON")` document.id array.index string
# <chr> <int> <int> <chr>
#1 "\"one\"" 1 1 one
#2 "\"two\"" 1 2 two
#3 "\"three\"" 1 3 three
Nested arrays are difficult to work with. For example,
At this point, there is no way to gather the next array unless we filter on
type == 'array'
.append_values_number
works, but returnsNA
for the array, andrecursive = TRUE
doesn't work through the second level array. Further, it could be that the types are mixed.