Open robertzk opened 9 years ago
Dear Robert,
Sorry for my late response: I was on a leave. I'd be happy to cooperate/integrate: do you have suggestions on what to integrate?
Best,
Edwin
2015-04-18 22:43 GMT+02:00 Robert Krzyzanowski notifications@github.com:
objectdiff https://github.com/robertzk/objectdiff
Wonder if there's anything we can collaborate on?
— Reply to this email directly or view it on GitHub https://github.com/edwindj/daff/issues/8.
Thanks for the response edwin!
The way objectdiff works is it provides a function called objectdiff
that computes a closure containing the "diff" between two arbitrary R objects. For example, if we have:
iris2 <- iris
iris2$new_column <- 1
patch <- objectdiff(iris, iris2)
Then patch
will only store the new_column
, rather than duplicating the full data set. This is particularly useful in wide data sets with hundreds or thousands of columns.
# Proof that the patch is smaller
> object.size(patch)
1896 bytes
> object.size(iris)
7088 bytes
> object.size(iris2)
8384 bytes
If you apply several modifications to a data.frame, you can start with only a copy of the initial set and its succession of patches to work your way to the final data.frame. This has two advantages: (1) you know what changed in each step, (2) it occupies much less memory.
Going further, objectdiff provides a tracked_environment
that stores any changes to an R environment
object using patches obtained from objectdiff
. My question then is whether we can generate a plot of changes to, say, a data frame, by mapping patches obtained from objectdiff to plottable diffs obtained from daff.
Do you think this would be an interesting project? I could probably dedicate a weekend to it.
objectdiff
Wonder if there's anything we can collaborate on?