Open bcow opened 3 years ago
Thanks for the report!
I would argue that Postgres
should be the default. RPostgres
is actively maintained and fully DBI compliant, so it keeps up with all the dplyr
database stuff and, more importantly, has important features like the ability to run parameterized queries. RPostgreSQL
is has far fewer features, and AFAICT hasn't been updated in the last 4 years.
I tried to address some of these integer64 issues a while ago in https://github.com/PecanProject/pecan/pull/2318, but it seems I may have missed some.
they also cannot be converted to character and break logger messages
The way I've always addressed the string part of this is by always wrapping ID fields (as in the PR above) is by wrapping them in a call to format
. I think that works because it dispatches to something like a format.integer64
(or maybe format.bit64
) method that understands long integers.
...and further database queries
Where we have to use string interpolation, we should just make sure to use format
. That said, we should really use parameterized queries / prepared statements instead -- they're safer, more robust to type mismatch like this, I think more performant, and more elegant. Our db.query
function already has support for this.
All the higher-order dplyr
-based queries should be immune to these issues.
Great I'll also put in my vote for making Postgres
the default. I'll wait for more input, but once we come to a consensus, I can take on making these changes. As I clean up the benchmarking code and changes I've made to met.process
I'll check for the effects it has on all related database interactions, but while I was finishing my dissertation I essentially hardcoded everything to Postgres
in db.open
so I've unofficially already done that. Also, as I've been going along I've been translating to dplyr
/ dbplyr
whenever possible so once the integer64 problem is fixed, there should be some much more reliable sections of code.
Some thoughts, i think we pick on or the other, whichever is best supported.
we should modify the correct driver at https://github.com/PecanProject/pecan/blob/develop/base/db/R/utils_db.R#L205 and remove (print warning) to change to the correct version.
Once done we should remove the not used package so it is not installed in the pecan docker image anymore.
This issue is stale because it has been open 365 days with no activity.
I just ran into this problem when executing an SQL command that counted rows in a large table. I'm no expert, but on SO, IRTFM commented that "The integer64 class is created by the bit64 package. Need to recover the data without mangling by using functions from the package that created the integer64 object." But that's very clunky, and adding bigint = "integer" seems to work, though also a bit clunky.
Over two years ago, @ashiklom opened up PR https://github.com/PecanProject/pecan/pull/2318 raising an issue about the fact that connections to the database can potentially return values of class
integer64
instead ofinteger.
Without intervening code to catch and convert them, the returned values become meaningless and break the code downstream. For example, until they are converted to integer or numeric, they also cannot be converted to character and break logger messages and further database queries. As of now, the only way I can find to consistently ensure bigints are handled correctly is to use:Right now, the
betyConnect
function has evaded this issue by hardcoding driver to "Postgres" https://github.com/PecanProject/pecan/blob/ee38544e1c995f47073019bd31b3179e7e64991a/base/db/R/query.dplyr.R#L26 and then callingdb.open
but this isn't the place the problem should be addressed. There are multiple functions that calldb.open
directly (for example many of the db.query functions that are called inmet.process
and that's one reason it's breaking!)Furthermore, in
db.open
, if no driver is specified, the driver is set toPostgreSQL
, notPostgres.
https://github.com/PecanProject/pecan/blob/ee38544e1c995f47073019bd31b3179e7e64991a/base/db/R/utils_db.R#L203 You will see that many of thepecan.xml
files that are generated through using the browser interface havePostgreSQL
set as driver undersettings$database$bety
.The
bigint = "integer"
argument does not apparently work withPostgreSQL
, so an alternative approach would have to be taken to make sure allinteger64
were converted tointeger
. I did write a simple conversion function, which I can add as a PR, but that's adding functions (and would also have to include the use of an additional package, thebit64
package.)What are people's thoughts on how to proceed? Should
Postgres
be the default indb.open
? I don't fully understand the difference between the two packages and that's where I need other people's input.