beanumber / etl

R package to facilitate ETL operations
127 stars 21 forks source link

Add hook in etl.default to call user-defined extensions? #62

Closed atlas-research-amsterdam closed 2 years ago

atlas-research-amsterdam commented 2 years ago

Overriding the various etl_xxx.default(obj, ...) functions for a class foo using etl_xxx.foo(obj, ...) works splendidly.

However, the factory function etl.default itself cannot profit from this mechanism because it is called with a string and not an object, so that S3 dispatch will always call the default version.

I have a project where I have split the etl_transform step into various other steps using helpers such as etl_parse and etl_normalize. For each intermediate step, I want to create a separate directory (such as parse_dir) to cache the intermediate data frames before finally copying them to load_dir.

It would be nice if etl.default were to provide a hook of the form

if ("etl.etl_foo" %in% methods(etl)) {
   etl.etl_foo(obj, ...)
}

Then it becomes possible to initalize other project specific stuff (such as parse_dir) inside the user-defined etl.etl_foo.

beanumber commented 2 years ago

I'm happy to consider this. Could you submit a PR?

atlas-research-amsterdam commented 2 years ago

On second thought, after reading up on S3 generics in Advanced R, I think it's better for authors of derived classes to simply write their own constructor

etl_pkgname <- function(db = NULL, dir = tempdir(), ...) {
  obj <- etl("pkgname", db = db, dir = dir, ...)

  # do project specific extensions here

  obj
}