Closed sbilge closed 5 years ago
Thanks for the report @sbilge ! I think this is a good case to keep in mind. Do you have a simple reprex
that we can use for exploring?
Note that this will be the case with many multi row operations, because it is discarding the notion of a tbl_json
, because a single JSON row is becoming many. At present, we are throwing an error in this case, although there are other behaviors that we might explore.
The simplest solution would be to change your pipeline:
# instead of
object %>% unnest()
# try this
object %>% as_tibble() %>% unnest()
as_tibble
should drop the tbl_json
class and allow you to interface with the object like a normal tibble (i.e. forcibly discard the notion of the JSON attribute that you will no longer be needing).
If you are trying to unnest
an object that is in your JSON, you might look at the various spread
and gather
verbs that could also be an alternative to your approach! (again, a simple reprex may help us make a better recommendation here).
@colearendt Thank you very much for your answer. object %>% as_tibble() %>% unnest()
worked.
I attached a simplified input file. It seems like it crashes when the object "drugs" has a null "drug_pmid" value.
Here is the reprex:
list.of.packages <- c("dplyr", "dtplyr", "tidyr", "stringr", "tidyjson")
lapply(list.of.packages, library, character.only=T)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#>
#> Attaching package: 'tidyjson'
#> The following object is masked from 'package:dplyr':
#>
#> bind_rows
#> The following object is masked from 'package:stats':
#>
#> filter
#> [[1]]
#> [1] "dplyr" "stats" "graphics" "grDevices" "utils" "datasets"
#> [7] "methods" "base"
#>
#> [[2]]
#> [1] "dtplyr" "dplyr" "stats" "graphics" "grDevices" "utils"
#> [7] "datasets" "methods" "base"
#>
#> [[3]]
#> [1] "tidyr" "dtplyr" "dplyr" "stats" "graphics"
#> [6] "grDevices" "utils" "datasets" "methods" "base"
#>
#> [[4]]
#> [1] "stringr" "tidyr" "dtplyr" "dplyr" "stats"
#> [6] "graphics" "grDevices" "utils" "datasets" "methods"
#> [11] "base"
#>
#> [[5]]
#> [1] "tidyjson" "stringr" "tidyr" "dtplyr" "dplyr"
#> [6] "stats" "graphics" "grDevices" "utils" "datasets"
#> [11] "methods" "base"
biograph_json <- as.tbl_json("/PATH/TO/biograph_json.json")
biograph_drugs <- biograph_json %>%
enter_object("_items") %>% gather_array() %>%
spread_values(
gene_symbol = jstring("gene_symbol"),
hgnc_id = jstring("hgnc_id")
) %>%
dplyr::select(-array.index) %>%
enter_object("drugs") %>% gather_array() %>%
spread_values(
ATC_code = jstring("ATC_code"),
drug_name = jstring("drug_name"),
drug_source_name = jstring("source_name"),
drugbank_id = jstring("drugbank_id"),
target_action = jstring("target_action"),
drug_pmid = jstring("pmid"),
interaction_type = jstring("interaction_type"),
is_cancer_drug = jlogical("is_cancer_drug")
) %>%
mutate(hgnc_id = as.integer(hgnc_id)) %>%
mutate(drug_pmid = ifelse(drug_pmid == "null", NA, drug_pmid)) %>%
# make a row for every pubmed id
mutate(drug_pmid = str_split(drug_pmid, "\\|")) %>%
unnest(drug_pmid) %>%
dplyr::select(-document.id, -array.index)
#> Error: nrow(df) not equal to length(json.list)
Created on 2019-03-14 by the reprex package (v0.2.1)
Confirmed that this should be working in the latest dev version 0.2.3.9000
. Thanks for the report!
When tidyjson_0.2.3 is used with unnest() function, it crashes with the error: Error: nrow(df) not equal to length(json.list)
With version 0.2.1, it was working fine.
Versions: R 3.4.4 tidyjson_0.2.3.9000 dplyr_0.8.0.1 tidyr_0.8.3