blaze / datashape

Language defining a data description protocol
BSD 2-Clause "Simplified" License
183 stars 65 forks source link

Programmatically building datashapes #185

Closed jiffyclub closed 8 years ago

jiffyclub commented 8 years ago

Is there a way to construct datashapes without doing string munging? E.g. when you're not working interactively and are figuring out the types on the fly? In my particular application I'd be trying to load some data into SQL after figuring out what the types and names are. All the examples in the docs involve string shape specs, but that doesn't seem like the best way to set up the datashapes programmatically.

cpcloud commented 8 years ago

@jiffyclub Most types are in datashape/coretypes.py and can be constructed via their classes. For example, here's two ways to construct a record datashape:

import datashape as ds
from datashape import var, Record, int64, float64

dshape = ds.dshape("var * {a: int64, b: float64}")
dshape = var * Record([('a', int64), ('b', float64)])
cpcloud commented 8 years ago

@jcrist @llllllllll and I discussed the possibility of this pleasant yet evil shorthand for constructing record datashapes:

class MetaRecord(type):
    def __getitem__(self, slices):
        names = [x.start for x in slices]
        types = [x.step for x in slices]
        return Record(list(zip(names, types)))

class r(object):
    __metaclass__ = MetaRecord

ds = r['a'::int64, 'b'::float64]
jiffyclub commented 8 years ago

Thanks for the info! Is there a way to create null-allowed columns in this way too? As in this example:

>>> dshape = """
... var * {
...     product_no: int32,
...     product_sku: string,
...     name: ?string,
...     price: ?float64
... }
... """

I'm actually asking because I'm trying to have an auto-incrementing primary key, but if I don't provide a key in the data I get an error on the primary key column: sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) NOT NULL constraint failed:.

llllllllll commented 8 years ago

Yes, the ? prefix is the Option constructor:

'?int32' -> Option(int32)
jiffyclub commented 8 years ago

Thanks! It doesn't look like that makes it behave like the auto-incrementing primary key I can set up with SQLAlchemy. I've made an example at http://nbviewer.ipython.org/gist/jiffyclub/e7e41382c80496e12e26.

It looks like the big difference between the table created by SQLAlchemy and the one made by Odo is that the one from SQLAlchemy has nullable=False on the primary_key column:

# SQLAlchemy
Table('testing', MetaData(bind=Engine(sqlite:///:memory:)), Column('id', Integer(), table=<testing>, primary_key=True, nullable=False), Column('col1', Integer(), table=<testing>), schema=None)

# Odo
Table('testing', MetaData(bind=Engine(sqlite:///)), Column('id', BigInteger(), table=<testing>, primary_key=True), Column('col1', BigInteger(), table=<testing>, nullable=False), schema=None)

Is there a way to get the same behavior as with SQLAlchemy?

llllllllll commented 8 years ago

Doesn't primary_key imply nullable=False? I am not totally sure what you are trying to do with the example you linked. I know that it in postgres primary key is short for unique not null.

jiffyclub commented 8 years ago

Sorry, my point with the demo is that when I use SQLAlchemy and specify a primary key, when I load data (via conn.execute(table.insert(), data)), the primary key works as expected and I get 1, 2, 3...

But I can't figure out how to get the same behavior with odo/datashape. I either end with an error (see the bottom of the notebook), or if I try to make the 'id' column optional it doesn't get automatically filled, it stays null even though it's marked as the primary key (see cell 12 of the notebook). (This is with sqlite by the way.)

jiffyclub commented 8 years ago

Okay it looks like this is related to SQLite's design. I was using Integer with SQLAlchemy and int64 with Odo. If I use BigInteger with SQLAlchemy I run into the same problems as I do with Odo, and if I use int_ with Odo things work. Looks like it's related to this property of SQLite: https://www.sqlite.org/lang_createtable.html#rowid.

Thanks for all the help! I'll close this, but looking forward to more complete datashape docs in the future!