binarymatt / redshift_sqlalchemy

Amazon Redshift SQLAlchemy Dialect
MIT License
48 stars 21 forks source link

DDL emitted for integer primary keys isn't valid with Redshift #9

Closed sui66iy closed 8 years ago

sui66iy commented 10 years ago

Hi there,

If you run something like this code:

from sqlalchemy import create_engine, Table, Column, Integer, MetaData

def create_table(engine):

    tablename = 'temp_foo'
    id_col = Column('id', Integer, primary_key=True)
    value_col = Column('value', Integer)

    metadata = MetaData(bind=engine)

    table = Table(tablename, 
                  metadata, 
                  id_col, 
                  value_col,
                  prefixes=['TEMPORARY'])
    table.create(engine)

    return

You'll get an exception like this:

sqlalchemy.exc.NotSupportedError: (NotSupportedError) Column "temp_foo.id" has unsupported type "serial". '\nCREATE TEMPORARY TABLE temp_foo (\n\tid SERIAL NOT NULL, \n\tvalue INTEGER, \n\tPRIMARY KEY (id)\n)\n\n' {}

I'm seeing this with sqlalchemy 0.8.3. I think your redshift dialect is inheriting the default Postgres/psycopg behavior, which doesn't work since SERIAL isn't a supported Redshift datatype.

It works fine if you use a Char or something for your primary key. (Or just create the table externally and simply refer to it.)

Thanks,

Mike

binarymatt commented 10 years ago

You are correct, the redshift dialect does inherit from the postgres dialect. I will look into inheriting from the PGDDLCompiler and creating a redshift specific compiler.

ibebrett commented 10 years ago

Has any update been made on this issue? The version in pip still has this problem.

binarymatt commented 10 years ago

@ibebrett unfortunately i have not had time to dig into this recently. I am hoping that I can get some time soon.

ibebrett commented 10 years ago

I don't know if this makes any sense, but when I installed this via git master branch, it worked for my create db, but installing from pip failed.. Does that make any sense if you haven't fixed anything?

binarymatt commented 10 years ago

@ibebrett it looks like one pr that i merged a few months ago might have fixed this issue. However, i haven't done a release in a while. I will try to cut a new release today or tomorrow

ibebrett commented 10 years ago

That's awesome! If it doesn't work out, I would be willing to put together a pull request to fix the issue.

graingert commented 9 years ago

can confirm that this is fixed in https://pypi.python.org/pypi/sqlalchemy-redshift/0.1.0

wgillett commented 9 years ago

I'm seeing a similar problem with SQLAlchemy (1.0.6) and sqlalchemy-redshift (0.1.1), but with mapping rather than table creation. I have created the table in Redshift directly like so:

create table if not exists companies (
    id bigint identity primary key,
    name varchar(1024) not null
);

and mapped it like so:

Base = declarative_base()
class Company(Base):
    __tablename__ = 'companies'
    id = Column(BigInteger, primary_key=True)
    name = Column(String)

then when I make and create a company:

company = Company(name = ACME)
session.add(company)
session.commit()

I get this error:

sqlalchemy.exc.StatementError: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (sqlalchemy.exc.ProgrammingError) (psycopg2.ProgrammingError) relation "companies_id_seq" does not exist [SQL: 'select nextval(\'"companies_id_seq"\')'] [SQL: u'INSERT INTO companies (id, name) VALUES (%(id)s, %(name)s)'] [parameters: [{'name': 'Acme'}]]

Clearly the code is expecting an auto-incrementing sequence, but that doesn't work with Redshift. Any suggestions would be much appreciated.

karmel commented 9 years ago

@wgillett I ran into this same problem, and found a workaround that might work for you--

If you specify that your insert is inline, then SQLAlchemy won't try to hit the DB beforehand to get the sequence value, and will just omit the id unless it is specifically passed as a parameter. This is a side-effect of inline, so not ideal, but seems to work...

my_declarative_model.__table__.insert(inline=True).values(**kwargs)
graingert commented 9 years ago

Redshift is not compatible with session.add() where you don't know the primary_key in advance. I use a client generated UUID for my primary keys where I need to be able to relate them

wgillett commented 9 years ago

Thanks @karmel and @graingert. Client-generated UUID should work for my use case.