JuliaDatabases / DBDSQLite.jl

DBI-compliant driver for SQLite3
Other
1 stars 0 forks source link

disconnect causes error #1

Open osyed opened 9 years ago

osyed commented 9 years ago

Calling disconnect(db) gives this error:

ERROR: ccall: could not find function sqlite3_close_v2 in library libsqlite3 in include at /usr/bin/../lib64/julia/sys.so in process_options at /usr/bin/../lib64/julia/sys.so in _start at /usr/bin/../lib64/julia/sys.so (repeats 2 times)

I have sqlite3 version 3.6.20 and julia version 0.3.1

johnmyleswhite commented 9 years ago

This sounds like your version of SQLite is too old to work with this library in its current state.

osyed commented 9 years ago

Strange. Everything else seems to work fine except for disconnect().

johnmyleswhite commented 9 years ago

2012-09-03 (3.7.14)

Drop built-in support for OS/2. If you need to upgrade an OS/2 application to use this or a later version of SQLite, then add an application-defined VFS using the sqlite3_vfs_register() interface. The code removed in this release can serve as a baseline for the application-defined VFS. Ensure that floating point values are preserved exactly when reconstructing a database from the output of the ".dump" command of the command-line shell. Added the sqlite3_close_v2() interface.

johnmyleswhite commented 9 years ago

That's from the SQLite3 changelogs: http://www.sqlite.org/changes.html

osyed commented 9 years ago

Thanks. Good to know this will go away once we upgrade SQLite.

quinnj commented 9 years ago

In a pending rewrite of the SQLite package, our close call looks like

try
        return ccall( (:sqlite3_close_v2, sqlite3_lib),
            Cint, (Ptr{Void},),
            handle)
    catch
        # Older versions of the library don't have this, abort to other close
        warn("sqlite3_close_v2 not available.")
        sqlite3_close(handle)
    end

to handle older libraries.

Also note that a new version of SQLite was just released a few days ago that offers some significant performance increases, so it's something to keep in mind when upgrading. (http://www.sqlite.org/news.html)

osyed commented 9 years ago

Cool. That's an easy patch we can do locally. The version of SQLite we have is the one provided in the CentOS base repo and installed using yum. I just tried an update of sqlite and it says there is nothing to update. I can't believe they are so far behind.

johnmyleswhite commented 9 years ago

@quinnj: I wonder if you could do some metaprogramming to make the version dependencies occur at compile time?

quinnj commented 9 years ago

Yeah, we should definitely do that. Looks like there's a sqlite3_libversion call to get the version number.

@johnmyleswhite, have you seen the rewrite going on for SQLite.jl? I'm pretty excited about the update. It's a much more "julia" interface, and tries to wrap the core API much tighter (i.e. removing DataFrames dependency, only implement simple wrappers around api calls with performance enhancements where appropriate). We're also about to merge a branch that allows defining julia functions to be used within SQL statements.

I had forgot you had started this package with a similar idea in mind (if I remember correctly). Since SQLite will be a much smaller/lighter package going forward, it may make sense to have a dbi.jl file in the SQLite.jl package or wrap SQLite.jl in this package. We should look into how to coordinate efforts.

johnmyleswhite commented 9 years ago

Yes, let's coordinate efforts. I'll read up on your rewrite to catch up with what you've done.

FWIW, my dream vision is that we get something like DBI working in a really clean way and then make it possible to define everything you might want to do on a tabular data structure in terms of DBI operations + something like SQLAlchemy for Julia. Now that Nullable is in Base, we should be able to do things in a way that is completely backend agnostic.

johnmyleswhite commented 9 years ago

Read through the revised SQLite.jl quickly. Lots of things that I'm really excited about, but there are a few places where I'd like to push on the design:

quinnj commented 9 years ago

I understand the idea of DBI.jl, but I also think it's a little overkill/over-abstraction to have two packages for bare minimum vs. sugar/convenience functions. I envision being able to have a dbi.jl file within the SQLite.jl package that implements the DBI interface (perhaps when conditional modules come around, that'll be even easier), while the rest of the package stays as is. That way, if I'm a heavily DBI-reliant user needing to switch backends or what-not, I don't have to change my interface to use SQLite, but for regular/power users, I have the convenience features baked in. The package is incredibly lite as-is, so the "overhead" of including convenience functions is negligible IMO. Perhaps for a more complex/invovled DB backend (Postgres, etc.) it would be more important to make the separation, but I just don't a lot of gain at this point for SQLite.

cc: @Sean1708 for this discussion. He's a new co-collaborator for the SQLite.jl rewrite and has contributed some great functionality.

johnmyleswhite commented 9 years ago

@quinnj You don't need to have two packages: it's an organizational concept more than a package split choice. In an idealized setup, most users would never use any package except DBI, which would load other packages like SQLite as needed when the database being used happens to be SQLite. The sugar functions go into DBI -- the only thing the driver packages do is translate a standardized protocol of sugar calls into appropriate database-specific calls. This would mean that it's kind of odd to have dbi.jl in SQLite: the organizational concept I'm describing goes in the exact opposite direction. In a database abstraction system, the purpose of SQLite is to implement the protocol that DBI says every database must implement. Normal users then write code that's completely database agnostic using DBI; only in extraordinary circumstances do they write code that depends upon database specific functionality. There are certainly cases where database-specificity is useful, but I don't think it's a good default -- especially for non-power users.

To help me understand your position, I'd love to understand more about where you're coming from. Have you ever worked with a package like SQLAlchemy or some other abstraction around databases? My work experience is that you need to switch between database drivers pretty often. (In one application I work on these days, literally every single request can switch the database backend so the application is effectively impossible to maintain without something like DBI.)

In particular, I think it's important to note that I don't see putting sugar into SQLite as overload -- I see it as excessive specialization towards one specific database. In particular, the more sugar that you put into SQLite, the more likely it becomes that those sugar functions won't work when you need to switch to another database. Since I'm particularly interested in having a revision of DataFrames support the DBI protocol, I'd really like to make sure that Julia supports a DBI style abstraction everywhere. Swapping between DataFrames and SQLite tables would be a huge gain for Julia -- and it requires some abstractions to make it possible at all.

Would love to hear from @iamed2, who I know has a bunch of familiarity with Python's database abstractions.

Sean1708 commented 9 years ago

Just RE points 1 & 3 from a couple of posts up.

Returning an iterator would be an incredibly simply change should we decide to go that way. We could always keep ResultSet and allow users to fill it themselves.

I personally would prefer to keep bind and execute seperate since it allows users to create methods which treats certain types in a special way. For example, if I'm using BigInts in my application these will normally be bound as a serialised Julia value. If I then decide that I want to store them as text so that I can view them from SQLite's CLI I would simply define the method

Base.bind(stmt::SQLiteStmt, i::Int, val::BigInt) = bind(stmt, i, string(val))
johnmyleswhite commented 9 years ago

@Sean1708: Keeping bind separate for that reason is pretty compelling. Seems possible to reconcile with my point about execute implying bind since you define execute in terms of calls to bind.

By coincidence, Hadley Wickham just shipped a new SQLite library for R that exactly adopts the design I'm been arguing for by using R's DBI and then making RSQLite a driver for DBI: http://blog.rstudio.org/2014/10/25/rsqlite-1-0-0/

quinnj commented 9 years ago

You don't need to have two packages: it's an organizational concept more than a package split choice. In an idealized setup, most users would never use any package except DBI, which would load other packages like SQLite as needed when the database being used happens to be SQLite.

How would this work? How would DBI know that I need to load an SQLite database without a user saying something like using SQLite?

The sugar functions go into DBI -- the only thing the driver packages do is translate a standardized protocol of sugar calls into appropriate database-specific calls. This would mean that it's kind of odd to have dbi.jl in SQLite: the organizational concept I'm describing goes in the exact opposite direction. In a database abstraction system,

My idea would be to implement the DBI protocol in a dbi.jl file, while also maintaining the rest of the package as is.

I guess my motivation here comes from my own personal experience with SQLite: that to work efficiently with it, it indeed does require a lot of nuances to get great performance. For example, the reason for drop(db, table) is for the extra call to VACUUM, which does the physical freeing of memory from the dropped table. Something a naive user would expect to happen, but wouldn't get if a simple call to drop table db.table was sent in a query. Other examples are table creation where the PRAGMA sycnhronous, bind-execute, and analyze $table commands come into play to provide huge performance gains in the table creation itself and future queries on that table. Another idea I've played around with is having automatic index creation on by default for new tables, since they play such a huge roll in select query performance (more so than other dbs).

I think the answer to these examples in DBI world is to add these sugar functions to the DBI protocol (drop, createtable, etc), but I don't know if there's general interest in that.

To help me understand your position, I'd love to understand more about where you're coming from. Have you ever worked with a package like SQLAlchemy or some other abstraction around databases? My work experience is that you need to switch between database drivers pretty often. (In one application I work on these days, literally every single request can switch the database backend so the application is effectively impossible to maintain without something like DBI.)

I think this is also where we're having a hard time connecting on this idea. This has almost never been my experience. I've only ever needed to work with a single db at a time. I can certainly understand the advantage of a consistent API to DB packages, which is why I wrote ODBC.jl which as I've come to understand has the same aims of DBI with a slightly different structure (as long as the DB provides an ODBC driver, the frontend ODBC.jl can connect to any backend with the same interface).

In particular, I think it's important to note that I don't see putting sugar into SQLite as overload -- I see it as excessive specialization towards one specific database. In particular, the more sugar that you put into SQLite, the more likely it becomes that those sugar functions won't work when you need to switch to another database. Since I'm particularly interested in having a revision of DataFrames support the DBI protocol, I'd really like to make sure that Julia supports a DBI style abstraction everywhere. Swapping between DataFrames and SQLite tables would be a huge gain for Julia -- and it requires some abstractions to make it possible at all.

I guess I still feel that providing DB-specific sugar functions is ok, as long as a user is made aware of such a fact. I think then it's a pretty clear path on workflow: if I know I need to be switching backends fairly often, DBI or ODBC are my goto and I stick to those interfaces whereas if I know I'm working with a single DB in an app, I'm free to exploit a more rich SQLite interface and gain some convenience/performance enhancements along the way. I guess I don't like the idea of restricting an API for better consistency for a portion of users.

johnmyleswhite commented 9 years ago

How would this work? How would DBI know that I need to load an SQLite database without a user saying something like using SQLite?

This can be handled in many ways. In Perl's DBI and SQLAlchemy, you use strings with specialized formatting as in the example below:

from sqlalchemy import create_engine
engine = create_engine('postgresql://scott:tiger@localhost:5432/mydatabase')

It looks like Python's sqlite3 library takes a very different view. There you work with the sqlite3 library directly, but that library implements the standardized interface proposed in PEP 249.

My idea would be to implement the DBI protocol in a dbi.jl file, while also maintaining the rest of the package as is.

If we take a hint from Python's sqlite3 library, this might work well. I need to read through that code more to understand how they enforce uniformity across databases while still exposing SQLite3 specific functionality. For me, that's the crux of the problem: make sure your definition of the database interaction protocol is database independent.

I guess my motivation here comes from my own personal experience with SQLite: that to work efficiently with it, it indeed does require a lot of nuances to get great performance. For example, the reason for drop(db, table) is for the extra call to VACUUM, which does the physical freeing of memory from the dropped table. Something a naive user would expect to happen, but wouldn't get if a simple call to drop table db.table was sent in a query. Other examples are table creation where the PRAGMA sycnhronous, bind-execute, and analyze $table commands come into play to provide huge performance gains in the table creation itself and future queries on that table. Another idea I've played around with is having automatic index creation on by default for new tables, since they play such a huge roll in select query performance (more so than other dbs).

In SQLAlchemy world, this could be handled by SQL dialects which automatically generated appropriate SQL's that customized per database. You can see more about the SQLite3 dialect at http://docs.sqlalchemy.org/en/rel_0_9/dialects/sqlite.html

I think this is also where we're having a hard time connecting on this idea. This has almost never been my experience. I've only ever needed to work with a single db at a time. I can certainly understand the advantage of a consistent API to DB packages, which is why I wrote ODBC.jl which as I've come to understand has the same aims of DBI with a slightly different structure (as long as the DB provides an ODBC driver, the frontend ODBC.jl can connect to any backend with the same interface).

My experience is that ODBC isn't language specific enough to be the appropriate interface for a specific language like Julia. The abstraction I'd argue that you want is one that promises that every select query cursor will always produce an iterator of the same Julia type. ODBC.jl can invent these standards, but then ODBC is your de facto definition of Julia's DBI protocol.

I'm not sure I have a better argument than that, but I think it's worth noting that no language I know of uses ODBC for most work. Things always end up going through database specific drivers via something like DBI.

I guess I still feel that providing DB-specific sugar functions is ok, as long as a user is made aware of such a fact. I think then it's a pretty clear path on workflow: if I know I need to be switching backends fairly often, DBI or ODBC are my goto and I stick to those interfaces whereas if I know I'm working with a single DB in an app, I'm free to exploit a more rich SQLite interface and gain some convenience/performance enhancements along the way. I guess I don't like the idea of restricting an API for better consistency for a portion of users.

I do agree with this. I just worry that it's a not small portion of users who want consistency, but almost all users. I think there's a huge gain in being able to say, "I know my language's DBI protocol and can therefore write code for any SQL database that anyone might ever create, regardless of ODBC support." This level of consistency also lets you build further levels of abstraction on top of an arbitrary database.

iamed2 commented 9 years ago

I'll note that it's increasingly common to use an SQLite DB as a mock for PostgreSQL or MySQL for testing purposes, or to release a program that uses a DB that includes SQLite3 interactions by default as a fallback.