Open ankane opened 6 years ago
there could be dbx_logging
option, similar to dbx_verbose
but logs would goes into db table instead of stderr
(as in case of message
).
if possible to handle those two functionalities in base R it would make dbx
package lightweight.
importFrom(urltools,get_credentials)
importFrom(urltools,url_parse)
Examples of using RPostgreSQL would be nice, it is much lighter than RPostgres. Not sure how it is currently but few years back I could not achieve few functionalities in RPostgres I was using in RPostgreSQL. There was some heavy dev to RPostgres in late 2017 so some of those could have been addressed by now. Anyway RPostgreSQL is battle tested 10 years old package so definitely should not be skipped. Not sure, but it may help, I have a package on RPostgreSQL doing upsert, etc: https://github.com/jangorecki/pg
Hey @jangorecki, thanks for the ideas. I added support for RPostgreSQL. I agree with keeping the package lightweight and minimizing dependencies, but don't want to reinvent/maintain a URL parser. As for logging, you can now pass a function to dbx_verbose
and do what you'd like with the logs.
writeLogs <- function(sql) {
# your code
}
options(dbx_verbose=writeLogs)
Quick follow-up: Made urltools optional since it's only needed if you use the url functionality. Didn't realize that it had two dependencies itself.
It could be nice to add wrapper to load a CSV/TSV files. dbxCopy
or dbxLoad
for example.
Hey @artemklevtsov, thanks for the suggestion. Since it only takes 1 line to convert a CSV/TSV into a data frame, I'm not sure a separate function is needed.
I mean this function should use the effective COPY INTO
instead INSERT
.
What would be nice is a way to add missing columns when trying to update, insert or upsert.
E.g. I'm getting data from a web service where new columns can crop up now and then, e.g. "label2"
(this example is taken from the gitlab API). As far as I know there's currently no easy way to emulate data.table::rbind(fill = TRUE)
, i.e. to first do an ALTER TABLE
to add the missing columns and then use UPDATE
to push the data to the database.
@stefanfritsch thanks for the suggestion. This use case doesn't seem very common, so I think it's best accomplished outside of dbx.
Hi Ankane! I think a nice feature would be to have something like dbxRead() in which it is possible to provide a 'where' data frame, 'return_cols' (default: 'ALL') and a batch_size argument. E.g. I have tables with primary keys consisting of two or more columns. Hence dbxSelect with the 'params' argument does not work, for it can only handle only one column / a vector? Also the batch_size argument would be helpful for large tables. Cheers!
Hey @budfox123, thanks for the suggestions. I've considered adding a more ORM-type function for constructing queries (instead of writing SQL manually), but want to keep things simple for now. You can probably write a function that wraps dbxSelect
to do this. If you do, please share.
Hi @ankane, would it be possible to let dbxUpdate
and dbxUpsert
return the number of changed (or sum of changed and added) rows?
Hey @psimm, both run dbExecute
behinds the scenes (unless returning
is passed to upsert), so it could probably sum them. Can you explain the use case a bit more?
Hi @ankane thanks for your reply! Sorry, I think my previous suggestion of summing them was not a good idea. Here is my use case: I am downloading data from an API on a schedule. The API sometimes returns values that are already in the DB, so I use dbxUpsert
. I am keeping logs of all interactions with the API. Ideally, dbxUpsert
would let me know how many are new and how many are updated.
If you're on Postgres and use the version from GitHub, you can do:
dbxUpsert(db, table, records, where_cols=c("id"), returning=DBI::SQL("id, (xmax = 0) AS inserted"))
I'm not sure if it's possible with the other databases.
Awesome! Yes, I am using Postgres. Thank you for adding that.
Error: Failed to fetch row: ERROR: ON CONFLICT DO UPDATE command cannot affect row a second time HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.
Please create a new issue to discuss any ideas or share your own.
0.4.0 release
adapter
first param ofdbxConnect
(to matchdbConnect
)mysql
bigint3
orbigint4
branchIdeas
select last_insert_id(), row_count()
https://stackoverflow.com/questions/7333524/how-can-i-insert-many-rows-into-a-mysql-table-and-return-the-new-idsOn hold
update ... from
link, but this requires typecasting)retries
option todbxSelect
- undo revert ofdd9d716651f9321afe74def1fac491de4a4daca0
(users can use a separate package like retry for this)statement_timeout
option todbxSelect
and maybe other commands - probably not commonconn
optional (use global connection if not passed)dbxCreateTable(db, table, records, types=list(id="bigserial primary key", score="float"))
(only first 3 args required, types can be used to override defaults)cache
option to cache select statements to disk -cache
branch (need ability to cache for duration and force refresh)