edwindj / daff

Diff, patch and merge for data.frames, see http://paulfitz.github.io/daff/
https://edwindj.github.io/daff/
Other
153 stars 18 forks source link

Consider integrating with objectdiff #8

Open robertzk opened 9 years ago

robertzk commented 9 years ago

objectdiff

Wonder if there's anything we can collaborate on?

edwindj commented 9 years ago

Dear Robert,

Sorry for my late response: I was on a leave. I'd be happy to cooperate/integrate: do you have suggestions on what to integrate?

Best,

Edwin

2015-04-18 22:43 GMT+02:00 Robert Krzyzanowski notifications@github.com:

objectdiff https://github.com/robertzk/objectdiff

Wonder if there's anything we can collaborate on?

— Reply to this email directly or view it on GitHub https://github.com/edwindj/daff/issues/8.

robertzk commented 9 years ago

Thanks for the response edwin!

The way objectdiff works is it provides a function called objectdiff that computes a closure containing the "diff" between two arbitrary R objects. For example, if we have:

iris2 <- iris
iris2$new_column <- 1
patch <- objectdiff(iris, iris2)

Then patch will only store the new_column, rather than duplicating the full data set. This is particularly useful in wide data sets with hundreds or thousands of columns.

# Proof that the patch is smaller
> object.size(patch)
1896 bytes
> object.size(iris)
7088 bytes
> object.size(iris2)
8384 bytes

If you apply several modifications to a data.frame, you can start with only a copy of the initial set and its succession of patches to work your way to the final data.frame. This has two advantages: (1) you know what changed in each step, (2) it occupies much less memory.

Going further, objectdiff provides a tracked_environment that stores any changes to an R environment object using patches obtained from objectdiff. My question then is whether we can generate a plot of changes to, say, a data frame, by mapping patches obtained from objectdiff to plottable diffs obtained from daff.

Do you think this would be an interesting project? I could probably dedicate a weekend to it.