beanumber / etl

R package to facilitate ETL operations
127 stars 21 forks source link

Postpone any db access until etl_load(), instead of at instantiation #10

Closed beanumber closed 8 years ago

beanumber commented 8 years ago

This is what @cpsievert would do.

beanumber commented 8 years ago

I'm beginning to think that instead, an etl object should just extend a src object (which can be either src_sql or potentially, a src_local -- see #8) object. That way, you can do things like:

airlines <- etl("airlines")
airlines %>%
  etl_extract() %>%
  etl_transform() %>%
  etl_load()
airlines %>%
  tbl("flights") %>%
  blah

If you specified a db connection argument to etl(), then it would use that, but if not, it would use SQLite (or potentially local storage -- see #8).

So rather than making the DB connection specific to etl_load(), it ties it inextricably to the etl object. But, if the user doesn't want to set up a connection to MySQL or PostgreSQL, they can still use etl and remain oblivious to what's happening behind the scenes.

cpsievert commented 8 years ago

That's essentially what I was going for in #3, except that etl_extract() and etl_transform() would extend (return) a list of local data frame(s), and etl_load() (as well as etl_update(), etc) would extend (return) a src object, but package authors could potentially make their own assumptions what etl_extract()and etl_transform() return as long as they provide suitable "database methods"

All this being said, it would still be possible to specify the connection in etl() using my approach in #3. You'd just have to add ... as an argument to etl()

cpsievert commented 8 years ago

And I guess related to #8, I think this design makes sense for users that don't want to work with a database at all since they can just do

airlines <- etl("airlines")
airlines %>%
  etl_extract() %>%
  etl_transform()

and go on there merry-way

beanumber commented 8 years ago

OK, I think I get it. I am making some progress with this and will make a commit soon.

beanumber commented 8 years ago

Most of this is now implemented in the newapi branch.

beanumber commented 8 years ago

This is implemented in the newapi branch. An object of class etl extends an object of class src_sql, and if you don't specify a DB connection, you get a local SQLite database.