ghaering / pysqlite

Python DB-API module for SQLite 3.
zlib License
371 stars 89 forks source link

Expose sqlite3_column_type/sqlite3_column_decltype/etc... for a column type metadata in cursor #87

Open brianv0 opened 9 years ago

brianv0 commented 9 years ago

Hi,

I'd like to be able to get access to type metadata of each column without necessarily creating a bunch of converters for every query. Normally this could probably be achieved through cursor.description, but that only returns column names and I have nothing to fall back on at query time (I do have PRAGMA table_list(...) but that doesn't help me much.

Say we have a table foo defined in bar.db as so:

create table foo (a integer, b integer);
insert into foo (a, b) values (1, 1);
insert into foo (a, b) values (2, null);

In python, I might have some code like this:

import sqlite3
conn = sqlite3.connect("bar.db")
curs = conn.cursor()
curs.execute("SELECT a, b FROM foo WHERE a = 2")
row = curs.fetchone()
types = ???
return {"columnTypes": types, "result": row}

In this example, the type is clearly an integer, but I can't rely on type(row[1]) nor any other mechanism outside of adding my own converters for everything.

brianv0 commented 9 years ago

I'm going to go ahead and comment on this again, as I noticed you are deprecating converters. In that case, I would say that some extra type information, especially about declared types, is likely necessary to understand the nature of the results from the query.

My particular use case is to allow users to execute arbitrary SQL against a SQLite database (including joins) and return that data as JSON object representing all the rows of a query, with an additional JSON object representing a type definitions of the columns returned. In order to build some sort of type definition/result metadata, as well as make sure I serialize BLOB types correctly, it'd really help to have access to the types declared in SQLite.

In the meantime, I was going to monkey patch this using converters and wrap all objects with type information and then write a custom JSON decoder. I would cache the result type information after processing the first row, and then the JSON decoder would just ignore the information when it could (or potentially serialize in specific cases, like a declared "BOOLEAN" type could be true if 1, 0 if false, etc... This would be very inefficient.

In this case, a double and boolean converter might look like this:

converters = {
    "DOUBLE": lambda x: {"value": float(x), "type": "double"},
    "BOOLEAN": lambda x: {"value": int(x) == 1, "type": "boolean"}
}

This workaround also suffers from the fact that the converters are scoped at the module level, so you can't have custom converters at the connection level. This was why I created this issue.

As a suggested patch, I'd recommend either attempting to reuse the cursor.description object and possibly allow arbitrary mapping to the type_flag attribute, or possible

Examples:

connection.decltype_map["boolean"] = 1
connection.decltype_map["integer"] = 2

or possibly:

class decl_type_mapper:
    def __getitem__(self, key):
        dt = key.lower()
        if dt == "boolean":
            return 1
        elif dt == "integer":
           return 2
        ...
connection.decltype_map = decl_type_mapper()
# isTrue of type boolean, ratio of type double
cursor = connection.execute("SELECT isTrue, ratio FROM foo")
col_metadata = {col[0]:col[1] for col in cursor.description}

Or, just add the decltypes to the cursor at runtime, so something like this might work:

col_metadata = [extract_metadata(dt) for dt in cursor.decltypes]

These are just some examples of how it might be done. I might be able to devote some time on a patch and submit a pull request if there could be some discussion and consensus on a way forward.