crate / sqlalchemy-cratedb

SQLAlchemy dialect for CrateDB.
https://cratedb.com/docs/sqlalchemy-cratedb/
Apache License 2.0
4 stars 2 forks source link

Add support for timezone aware datetime objects to SQLAlchemy dialect #92

Open chaudum opened 4 years ago

chaudum commented 4 years ago

Inserting a timezone aware datetime object using SQLAlchemy fails, even though, the object could be serialized into an ISO-format string that contains a timezone offset. https://github.com/crate/crate-python/blob/master/src/crate/client/sqlalchemy/dialect.py#L123-L125

from datetime import datetime, timezone, timedelta

dt = datetime(2020, 6, 23, 10, 0, 0, tzinfo=timezone.utc)
dt.strftime("%Y-%m-%dT%H:%M:%S.%f%z")
'2020-06-23T10:00:00.000000+0000'
# or
dt = datetime(2020, 6, 23, 12, 0, 0, tzinfo=timezone(timedelta(hours=2), name="CEST"))
dt.strftime("%Y-%m-%dT%H:%M:%S.%f%z")
'2020-06-23T12:00:00.000000+0200'
$ cr8 run-crate 4.1.5 -e CRATE_HEAP_SIZE=2G --disable-java-magic -- @crash  --hosts '{node.http_url}' -c "SELECT ('2020-06-23T10:00:00.000000+0000'::TIMESTAMP WITH TIME ZONE = '2020-06-23T12:00:00.000000+0200'::TIMESTAMP WITH TIME ZONE) AS equal"
# run-crate
===========

Skipping download, tarball alrady extracted at /home/christian/.cache/cr8/crates/crate-4.1.5
Starting Crate process
CrateDB launching:
    PID: 1812751
    Logs: /home/christian/.cache/cr8/crates/crate-4.1.5/logs/cr8-crate-run57244782.log
    Data: /tmp/tmpbpru7ibr (removed on stop)

Psql      : 127.0.0.1:5432
Http      : 127.0.0.1:4200
Transport : 127.0.0.1:4300
Cluster ready to process requests

# @crash
========

CONNECT OK
+-------+
| equal |
+-------+
| TRUE  |
+-------+
SELECT 1 row in set (0.017 sec)

Same applies for de-serializing.

It may even be nice to add support for TIMESTAMP WITH(OUT) TIME ZONE AT TIME ZONE fields.


This issue came up when I tried to insert a datetime object from the boto client response that is timezone aware, raising TimezoneUnawareException: Timezone aware datetime objects are not supported

chaudum commented 4 years ago

With this small diff:

diff --git a/src/crate/client/sqlalchemy/dialect.py b/src/crate/client/sqlalchemy/dialect.py
index 1f51acf..31cc650 100644
--- a/src/crate/client/sqlalchemy/dialect.py
+++ b/src/crate/client/sqlalchemy/dialect.py
@@ -20,7 +20,7 @@
 # software solely pursuant to the terms of the relevant commercial agreement.

 import logging
-from datetime import datetime, date
+from datetime import datetime, date, timezone

 from sqlalchemy import types as sqltypes
 from sqlalchemy.engine import default, reflection
@@ -121,7 +121,7 @@ class DateTime(sqltypes.DateTime):
             if value is not None:
                 assert isinstance(value, datetime)
                 if value.tzinfo is not None:
-                    raise TimezoneUnawareException(DateTime.TZ_ERROR_MSG)
+                    value = value.astimezone(timezone.utc).replace(tzinfo=None)
                 return value.strftime('%Y-%m-%dT%H:%M:%S.%fZ')
             return value
         return process

The following example application would work:

from datetime import datetime, timezone

from sqlalchemy import create_engine, Column
from sqlalchemy.schema import PrimaryKeyConstraint
from sqlalchemy.ext.declarative import DeclarativeMeta, declarative_base
from sqlalchemy.orm import scoped_session, sessionmaker
from sqlalchemy.types import DateTime, Float

Session = scoped_session(sessionmaker())
Base = declarative_base(metaclass=DeclarativeMeta)

class TestModel(Base):

    __tablename__ = "t1"
    __table_args__ = (PrimaryKeyConstraint("ts"),  {"schema": "doc"})

    ts = Column("ts", DateTime)

def main():
    engine = create_engine(
        "crate://",
        connect_args={
            "username": "crate",
            "password": None,
            "servers": "localhost:4200",
        },
        strategy="threadlocal",
    )
    Session.configure(bind=engine)
    Base.metadata.bind = engine

    session = Session()
    session.execute("""
        CREATE TABLE IF NOT EXISTS doc.t1 (
          ts TIMESTAMP WITH TIME ZONE
        )
    """)
    session.add(
        TestModel(ts=datetime(2020, 6, 23, 12, 0, 0, tzinfo=timezone.utc))
    )
    session.commit()

if __name__ == "__main__":
    main()
mfussenegger commented 4 years ago

I'm not sure if it is a good idea to discard information implicitly doing the serialization.

Converting to UTC would keep the time information itself intact, but it would lose the timezone. Depending on the users use-case this may be required information and a user might assume that TIMESTAMP WITH TIME ZONE keeps the time zone info as well, but it does not. (Which is even more confusing given that TIME WITH TIME ZONE does)

Seems to me that being strict about this is the better default.

Maybe some opt-in flag cloud be provided.

chaudum commented 4 years ago

I'm not sure if it is a good idea to discard information implicitly doing the serialization.

dt.strftime("%Y-%m-%dT%H:%M:%S.%f%z")

This would not discard any information, but would add the TZ offset to the ISO string.

Converting to UTC would keep the time information itself intact, but it would lose the timezone. Depending on the users use-case this may be required information and a user might assume that TIMESTAMP WITH TIME ZONE keeps the time zone info as well, but it does not. (Which is even more confusing given that TIME WITH TIME ZONE does)

Agree, the comment above wasn't meant as a solution :) The behaviour could/should be different for different column types (e.g. TIMESTAMP WITH TIME ZONE vs. TIMESTAMP WITHOUT TIME ZONE).

Seems to me that being strict about this is the better default.

Maybe some opt-in flag cloud be provided.

mfussenegger commented 4 years ago

This would not discard any information, but would add the TZ offset to the ISO string.

You'd still loose part of the information eventually, even with TIMESTAMP WITH TIME ZONE. The "which offset was there originally" part is not persisted and you cannot get it back out unless you save it into a separate column.

chaudum commented 4 years ago

This would not discard any information, but would add the TZ offset to the ISO string.

You'd still loose part of the information eventually, even with TIMESTAMP WITH TIME ZONE. The "which offset was there originally" part is not persisted and you cannot get it back out unless you save it into a separate column.

Got you!

But isn't it also somehow clear, that if the database stores the timestamp in UTC, that it will be converted and the offset is "lost"? Usually client applications take care that the timestamp from the server is converted into a "client local" timezone.

amotl commented 3 years ago

Hi.

Without reading about the specific details here, I just wanted to share a chance find. SQLAlchemy-Utc is a drop-in replacement of SQLAlchemy’s built-in DateTime type with timezone=True option enabled.

With kind regards, Andreas.