RMI-PACTA / resources

This is a place to explore and share resources. Check out the "Issues".
https://rmi-pacta.github.io/resources/
17 stars 5 forks source link

Write Reusable, Composable and Modular R Code #324

Open jdhoffa opened 4 months ago

jdhoffa commented 4 months ago

https://klmr.me/box/

This R package seems like it COULD be a very suitable answer to our pacta.* vs worfklow.* conundrum...

It is more or less a "superset" of R packages, that loosen some of the R packages conventions up, and allow modules to be loaded from different directories in the project.

It is in version 1.2.0 which may suggest that the developers believe it to be stable enough for a first release.

I am hesitant to incorporate new/ semi-experimental topics into our standard project structures, but since workflow.*s are a bit of a "wild west" anyway, ¯_(ツ)_/¯ could be interesting.

cc: @AlexAxthelm @cjyetman

Relates to https://github.com/RMI-PACTA/practices/issues/2

AlexAxthelm commented 4 months ago

This is neat, but it seems to break the "normal R programmer" paradigm in a lot of similar ways as drake/targets does. I wonder what it would look like in practice.

jdhoffa commented 4 months ago

I'm not able to make that connection as quick as you are, could you please elaborate?

AlexAxthelm commented 4 months ago

I looked through the docs a bit more, and I think I get what box does now (better at least).

Overall, it seems to bring python-style modules into R

So where in python, you might have

myproject
- trigonometry.py
- script.py
import math #builtin package
import trigonometry as trig
import logging as log

foo = trig.sin(math.pi)
log.info(foo)

using box, that would be something along the lines of

myproject
- trigonometry.R
- script.R
#script.R
box::use(math) #pretend there's an R math package
box::use(trig = ./trigonometry)
box::use(log = logger)

foo = trig$sin(math$pi)
log$info(foo)

The connection to targets is in the pretty big departure of coding style from what "normal R code" looks like (not that's a bad thing).

In both cases, they're trying to make the dependency graph explicit. With targets, it does so by checking the value of all dependencies (and hashing them), while not being incredibly concerned with the source of the object (foo depends on make_foo(), depends on dplyr::mutate(), but by default, targets' doesn't care that mutate is part of a package, just that it's a function that's call in make_foo())

While with box the sources are explicit, but the value/content/body of an object in a source is not (but if you trust your modules to not change without you knowing, this is nice for debugging).


Overall, I think this is an interesting idea, but I don't see it as being beneficial over structuring things into a normal R package, where there's already a lot of tooling support, and it fits nicely with the R paradigms (for better or worse).