colearendt / tidyjson

Tidy your JSON data in R with tidyjson
Other
182 stars 15 forks source link

Get bind_rows to work with tbl_json objects #58

Closed jeremystan closed 5 years ago

jeremystan commented 8 years ago

You can then get rid of rbind_tbl_json in utils.R

jeremystan commented 7 years ago

Not sure how to do this, given bind_rows doesn't work like other dplyr functions, wrap_dplyr_verb won't work to naturally extend it.

colearendt commented 7 years ago

bind_rows is not an S3 method, so it is not dispatched based on class. This makes sense in one regard, because it would require "aggregating" classes across many objects.

Likely some discussion needed on what implementation is suggested for tidyjson. There is relevant discussion on this currently open issue in dplyr. The discussion is promising of S3 behavior in the future.

Not sure whether it is better to create another verb with explicit tidyjson behavior (i.e. jbind_rows or something), or if masking the bind_rows function from dplyr makes more sense. The latter has the benefit of making things nice for those who load tidyjson after dplyr. Of course, for those who do not, tidyjson::bind_rows will be required.

A proposed implementation, based on the latter and drawing heavily from dplyr. Not sure if a warning is warranted or not:

bind_rows <- function(...) {
  r <- dplyr::bind_rows(...)

  d <- list_or_dots(...)
  if (all(unlist(lapply(d,is.tbl_json)))) {
    j <- unlist(lapply(d, attr, 'JSON'), recursive=FALSE)
    return(tbl_json(r,j))
  } else {
    warning('Some non-tbl_json objects.  Reverting to dplyr::bind_rows')
    return(r)
  }
}

See this commit

colearendt commented 7 years ago

Potential danger of implementing this before S3 support in dplyr is whether or not tidyjson will need to reexport the bind_rows once an S3 method is provided, for backwards compatibility.

colearendt commented 5 years ago

Closed in #108 and #109 , with the caveats above.

When this lands, we should revisit: https://github.com/tidyverse/dplyr/issues/2457