ankane / dbx

A fast, easy-to-use database library for R
Other
187 stars 15 forks source link

Ideas #1

Open ankane opened 6 years ago

ankane commented 6 years ago

Please create a new issue to discuss any ideas or share your own.

0.4.0 release

Ideas

On hold

jangorecki commented 6 years ago

there could be dbx_logging option, similar to dbx_verbose but logs would goes into db table instead of stderr (as in case of message).

jangorecki commented 6 years ago

if possible to handle those two functionalities in base R it would make dbx package lightweight.

importFrom(urltools,get_credentials)
importFrom(urltools,url_parse)
jangorecki commented 6 years ago

Examples of using RPostgreSQL would be nice, it is much lighter than RPostgres. Not sure how it is currently but few years back I could not achieve few functionalities in RPostgres I was using in RPostgreSQL. There was some heavy dev to RPostgres in late 2017 so some of those could have been addressed by now. Anyway RPostgreSQL is battle tested 10 years old package so definitely should not be skipped. Not sure, but it may help, I have a package on RPostgreSQL doing upsert, etc: https://github.com/jangorecki/pg

ankane commented 6 years ago

Hey @jangorecki, thanks for the ideas. I added support for RPostgreSQL. I agree with keeping the package lightweight and minimizing dependencies, but don't want to reinvent/maintain a URL parser. As for logging, you can now pass a function to dbx_verbose and do what you'd like with the logs.

writeLogs <- function(sql) {
  # your code
}

options(dbx_verbose=writeLogs)
ankane commented 6 years ago

Quick follow-up: Made urltools optional since it's only needed if you use the url functionality. Didn't realize that it had two dependencies itself.

artemklevtsov commented 5 years ago

It could be nice to add wrapper to load a CSV/TSV files. dbxCopy or dbxLoad for example.

ankane commented 5 years ago

Hey @artemklevtsov, thanks for the suggestion. Since it only takes 1 line to convert a CSV/TSV into a data frame, I'm not sure a separate function is needed.

artemklevtsov commented 5 years ago

I mean this function should use the effective COPY INTO instead INSERT.

stefanfritsch commented 4 years ago

What would be nice is a way to add missing columns when trying to update, insert or upsert.

E.g. I'm getting data from a web service where new columns can crop up now and then, e.g. "label2" (this example is taken from the gitlab API). As far as I know there's currently no easy way to emulate data.table::rbind(fill = TRUE), i.e. to first do an ALTER TABLE to add the missing columns and then use UPDATE to push the data to the database.

ankane commented 4 years ago

@stefanfritsch thanks for the suggestion. This use case doesn't seem very common, so I think it's best accomplished outside of dbx.

csatzky commented 4 years ago

Hi Ankane! I think a nice feature would be to have something like dbxRead() in which it is possible to provide a 'where' data frame, 'return_cols' (default: 'ALL') and a batch_size argument. E.g. I have tables with primary keys consisting of two or more columns. Hence dbxSelect with the 'params' argument does not work, for it can only handle only one column / a vector? Also the batch_size argument would be helpful for large tables. Cheers!

ankane commented 4 years ago

Hey @budfox123, thanks for the suggestions. I've considered adding a more ORM-type function for constructing queries (instead of writing SQL manually), but want to keep things simple for now. You can probably write a function that wraps dbxSelect to do this. If you do, please share.

psimm commented 4 years ago

Hi @ankane, would it be possible to let dbxUpdate and dbxUpsert return the number of changed (or sum of changed and added) rows?

ankane commented 4 years ago

Hey @psimm, both run dbExecute behinds the scenes (unless returning is passed to upsert), so it could probably sum them. Can you explain the use case a bit more?

psimm commented 4 years ago

Hi @ankane thanks for your reply! Sorry, I think my previous suggestion of summing them was not a good idea. Here is my use case: I am downloading data from an API on a schedule. The API sometimes returns values that are already in the DB, so I use dbxUpsert. I am keeping logs of all interactions with the API. Ideally, dbxUpsert would let me know how many are new and how many are updated.

ankane commented 4 years ago

If you're on Postgres and use the version from GitHub, you can do:

dbxUpsert(db, table, records, where_cols=c("id"), returning=DBI::SQL("id, (xmax = 0) AS inserted"))

I'm not sure if it's possible with the other databases.

psimm commented 4 years ago

Awesome! Yes, I am using Postgres. Thank you for adding that.

SurajitShasmal commented 3 years ago

Error: Failed to fetch row: ERROR: ON CONFLICT DO UPDATE command cannot affect row a second time HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.