beanumber / etl

R package to facilitate ETL operations
127 stars 21 forks source link

add db=FALSE option for local .rda storage #8

Closed beanumber closed 8 years ago

beanumber commented 8 years ago

I think this would mesh well with @cpsievert 's redesign.

cpsievert commented 8 years ago

Would this be an argument to extract() and/or transform()?

beanumber commented 8 years ago

Neither. But on second thought, @nicholasjhorton, why do we need this? Is it enough to have it use SQLite by default?

beanumber commented 8 years ago

I think what @nicholasjhorton really wants is a src_local function. So instead of using src_mysql or whatever, you can use a local set of files and read them directly into memory. But isn't src_sqlite a better way to do that?

beanumber commented 8 years ago

I think the way to do this is with src_df().

beanumber commented 8 years ago

I'm beginning to think that src_local is not appropriate for medium data. If you really want to work with medium data, storing it in some sort of SQL database is probably the right way to do this, especially if the data are relational. But if we use SQLite by default, then the user can remain ignorant of SQL and still make use of the functionality, right?

cpsievert commented 8 years ago

If we build on dplyr's tbl_df(), users will always be able to create a local data frame from a SQL table representation using collect(), so I think the question is: will users want to store data as an alternative format (e.g., csv, rdata, etc.)?

I vote that we don't add support for this since you can always collect(), then use write.csv(), save(), or whatever.

beanumber commented 8 years ago

Now that we have a SQLite database created automatically be default, I'm hoping that we won't need this.