colearendt / tidyjson

Tidy your JSON data in R with tidyjson
Other
182 stars 14 forks source link

New Version of dplyr Breaks tbl_json dplyr verbs #97

Closed colearendt closed 5 years ago

colearendt commented 7 years ago

The standard evaluation versions of dplyr verbs will be deprecated in a coming release of dplyr, it seems. See doc for reference. Lines 474 to 496.

If so, the method of implementing the dplyr verbs should be thought through - probably ahead of or in step with the dplyr release.

Thoughts are to use the regular forms (filter, mutate, etc.) since the SE forms will no longer be called, and then deprecate the SE forms consistently with dplyr. I have not tested yet, though.

A simple example to show the JSON is not filtered. The problems will pervade through many of the other functions, though, too. I.e. spread_all fails, likely due to different behavior.

devtools::install_github('tidyverse/dplyr')
library('dplyr')

devtools::load_all()

x <- commits %>% gather_array()

length(x$array.index) ## 30
length(attr(x,'JSON')) ## 30

## The JSON is not filtered as it should be
xf <- x %>% filter(array.index==1)

length(xf$array.index) ## 1
length(attr(xf,'JSON')) ## 30

## Fails 
## x %>% enter_object('commit') %>% spread_all
colearendt commented 7 years ago

Started working on this, but a change of attributes in the dev version of filter is causing trouble for passing tests. See tidyverse/dplyr#2772 for discussion / resolution of this issue.

colearendt commented 7 years ago

Since the SE forms are no longer called behind the scenes, I simply added the NSE forms to wrap_dplyr_verb.

Have not tested this with the tidyeval framework, yet.

colearendt commented 7 years ago

Note that tbl_df is soft deprecated in the new release of dplyr per the NEWS.md, so this should be changed to as_tibble. I am not a fan of the extra typing or the inconsistency with the attr(.,'class')... but it seems the decision has already been made.

colearendt commented 7 years ago

@MarkEdmondson1234 can you test your issue with devtools::install_github('jeremystan/tidyjson',ref='f6f13f4'). That is the development version where I believe I have a fix, and it would be helpful to know if it fixes before going through the CRAN release.

clesiemo3 commented 7 years ago

@colearendt for what it's worth I ran into a similar issue caused by https://github.com/jeremystan/tidyjson/blob/master/R/spread_all.R#L77 where it turned it into a data frame instead of json_tbl and later had trouble down the line. Installing the ref above fixed my issue at work. We have been using the master branch for spread_all() support because some internal R packages utilize it. +1 from me!

gaiusjaugustus commented 6 years ago

Continuing discussion from https://github.com/sailthru/tidyjson/issues/58

I have a json I downloaded from GDC, and am trying to flatten it to a dataframe. Was hoping to "simply" use tidyjson's verbs (gather_array, enter_object, and spread_values) to manually go through all the levels of the json and flatten them to a dataframe. Got the dplyr error...and it persisted even when I tried downgrading to dplyr 0.5.0 (which I can't stay at because other pipelines require dplyr 0.7.1).

colearendt commented 6 years ago

Hey @gaiusjaugustus. Unfortunately, I will be unable to help without a direct example (best practices for sharing minimal reproducible examples here ) that I can test. My only recommendation would be to try out my PR with the ref above or with the latest devtools::install_github("jeremystan/tidyjson","#103"). Unfortunately, until I am able to get in touch with the package maintainer, I cannot do a whole lot besides updating that PR. I would love to know if it is not working! The verbs you are working with should "simply" work (in an ideal world).

If you are struggling to get a reprex working (and if the data need not be secured), you could also try your hand at creating a RStudio cloud project that I can fork. (Note that RStudio Cloud is currently in alpha)

gaiusjaugustus commented 6 years ago

Just as an update, because the data is pretty complex (several layers deep) and I'm unsure where the problem is, and because the data is not allowed to be shared, creating a reproducible example was going to take some time. In the meantime, I was able to find a way to get the data I need in a flat file (from the provider of the data). If I run across this problem again in the future, I'll definitely update with more info.

Thanks for your time.

colearendt commented 6 years ago

Sure thing! Thanks for the update

colearendt commented 5 years ago

Should be resolved in #108 and #109