cynkra / collector

What the Package Does (One Line, Title Case)
0 stars 0 forks source link

detect what which df cols or list elts are needed #14

Open moodymudskipper opened 5 months ago

moodymudskipper commented 5 months ago

A crazy idea probably but...

The trick that wu use for environments using lazy bindings, could be use on data frames and lists if they were built on top of environments.

environments are generalised lists basically, they just miss bracket methods, length, and names are not an attributes (actually a "names" attribute cannot be set) .

If we hack the base namespace (and possibly rlang) we can patch those primitives to handle classed environment with classes data.frame or list (we could define super classes e_frame and e_list but it would be less robust), we could then replace data.frames and lists by classed environments, and use the lazy binding trick to find what columns or list elements are not required. The changes by reference are irrelevant here because we don't modify those objects.

Not trivial and can go wrong in many ways but this might also work in most cases, because most of these functions just call bracket methods and length down the line.

subsetting data.frames with i evaluates everything however, unless we use further magic make it lazy and apply it only after j subsetting occurs but it's really complicated at that point

krlmlr commented 5 months ago

This is what mutate() uses internally.

Let's do #15 first, understanding this will reduce the data size by a lot.