Pegase745 / sqlalchemy-datatables

SQLAlchemy integration of jQuery DataTables >= 1.10.x (Pyramid and Flask examples)
MIT License
159 stars 67 forks source link

complex calculations, was filter function #119

Closed louking closed 5 years ago

louking commented 5 years ago

I realize this is mostly an SQLAlchemy question, but I was hoping maybe you can give some guidance. And yes, this is a very late query about the "new" sqlalchemy-datatables since version 0.x.

I need to determine the age of a runner for a particular race result.

My models are as follows, and note my dates are stored as yyyy-mm-dd

class Runner(Base):
    __tablename__ = 'runner'
    __table_args__ = (UniqueConstraint('name', 'dateofbirth', 'club_id'),)
    id = Column(Integer, Sequence('user_id_seq'), primary_key=True)
    club_id = Column(Integer, ForeignKey('club.id'))
    name = Column(String(50))
    fname = Column(String(50))
    lname = Column(String(50))
    dateofbirth = Column(String(10))
    gender = Column(String(1))
    hometown = Column(String(50))
    renewdate = Column(String(10))
    expdate = Column(String(10))
    member = Column(Boolean)
    active = Column(Boolean)
    results = relationship("RaceResult", backref='runner', cascade="all, delete, delete-orphan")
    aliases = relationship("RunnerAlias", backref='runner', cascade="all, delete, delete-orphan")

class Race(Base):
    __tablename__ = 'race'
    __table_args__ = (UniqueConstraint('name', 'year', 'club_id', 'fixeddist'),)
    id = Column(Integer, Sequence('race_id_seq'), primary_key=True)
    club_id = Column(Integer, ForeignKey('club.id'))
    name = Column(String(MAX_RACENAME_LEN))
    year = Column(Integer)
    racenum = Column(Integer)   # deprecated, ignored
    date = Column(String(10))
    starttime = Column(String(5))
    distance = Column(Float)
    fixeddist = Column(String(10))   # null or coerced with "{:.4g}".format(distance)
    surface = Column(Enum('road','track','trail',name='SurfaceType'))
    locationid = Column(Integer, ForeignKey('location.id'))
    external = Column(Boolean)
    active = Column(Boolean)
    results = relationship("RaceResult", backref='race', cascade="all, delete, delete-orphan")
    series = relationship("RaceSeries", backref='race', cascade="all, delete, delete-orphan")

class RaceResult(Base):

    __tablename__ = 'raceresult'
    __table_args__ = (UniqueConstraint('runnerid', 'runnername', 'raceid', 'seriesid', 'club_id'),)
    id = Column(Integer, Sequence('raceresult_id_seq'), primary_key=True)
    club_id = Column(Integer, ForeignKey('club.id'))
    runnerid = Column(Integer, ForeignKey('runner.id'), index=True)
    runnername = Column(String(50)) # *** do not use!
    raceid = Column(Integer, ForeignKey('race.id'))
    seriesid = Column(Integer, ForeignKey('series.id'))
    gender = Column(String(1))
    agage = Column(Integer)
    divisionlow = Column(Integer)
    divisionhigh = Column(Integer)
    time = Column(Float)
    agfactor = Column(Float)
    agtime = Column(Float)
    agpercent = Column(Float)
    overallplace = Column(Float)
    genderplace = Column(Float)
    divisionplace = Column(Float)
    overallpoints = Column(Float)
    genderpoints = Column(Float)
    divisionpoints = Column(Float)
    agtimeplace = Column(Float)
    source = Column(String(20))
    sourceid = Column(String(128))
    sourceresultid = Column(String(128))
    fuzzyage = Column(Boolean)
    instandings = Column(Boolean)   
    hidden = Column(Boolean)

With 0.x sqlalchemy-datatables, I used filterarg='row' like follows to do the age conversion on the server.

def renderage(result):
    try:
        thisage = timeu.age(tYmd.asc2dt(result.race.date),tYmd.asc2dt(result.runner.dateofbirth))
    except:
        thisage = ''

    return thisage

                                 :

ColumnDT('age', mData='age', searchable=False, filterarg='row', filter=renderage),

I see https://github.com/Pegase745/sqlalchemy-datatables/issues/80 and recommendation to use javascript for rendering. Unfortunately my model knows the birth date and I don't want to send that to the browser.

Also I see the recommendation "Supporting the full power of SQLAlchemy was one of the reasons to propose the update which turned into 1.0.0." but unfortunately I apparently don't know how to use the full power of SQLAlchemy.

I've been struggling for a couple of days and have tried hybrid_property, column_property to try to dig into the Race and Runner records from RaceResult, thus, but am getting error trying to access the nested date field (and I assume dateofbirth would also cause a problem but my code didn't get that far.

Thus, neither

class RaceResult(Base):

                      : 

    racedate = column_property(select([Race.date]).where(Race.id==raceid))
    runnerdob = column_property(select([Runner.dateofbirth]).where(Runner.id==runnerid))

    @hybrid_property
    def age(self):
        return int(self.racedate[0:4]) - int(self.runnerdob[0:4]) - int(self.racedate[5:] < self.runnerdob[5:])

nor

    @hybrid_property
    def age(self):
        return int(self.race.date[0:4]) - int(self.runner.dateofbirth[0:4]) - int(self.race.date[5:] < self.runner.dateofbirth[5:])

work (the latter gives AttributeError: Neither 'InstrumentedAttribute' object nor 'Comparator' object associated with RaceResult.race has an attribute 'date'. I can reproduce the former if needed.)

Finally I realized the package supports combined expressions (which would be the most direct way to resolve this), but tried the following and am getting NotImplementedError: Operator 'getitem' is not supported on this expression:

ColumnDT(int(Race.date[0:4]) - int(Runner.dateofbirth[0:4]) - int(Race.date[5:] < Runner.dateofbirth[5:]), mData='age',  search_method='none'),

Full context of DataTables invocation:

            columns = [
                ColumnDT(Runner.name,                mData='name'),
                ColumnDT(Series.name,                mData='series'),
                ColumnDT(Race.date,                  mData='date'),
                ColumnDT(Race.name,                  mData='race'),
                ColumnDT(Race.distance,              mData='miles',              search_method='none'),
                ColumnDT(Runner.gender,              mData='gender',             search_method='none'),
                ColumnDT(int(Race.date[0:4]) - int(Runner.dateofbirth[0:4]) - int(Race.date[5:] < Runner.dateofbirth[5:]),
                         mData='age',                search_method='none'),
                ColumnDT(RaceResult.genderplace,     mData='genderplace',        search_method='none'),
                ColumnDT(RaceResult.division,        mData='division',           search_method='none'),
                ColumnDT(RaceResult.divisionplace,   mData='divisionplace',      search_method='none'),
                ColumnDT(RaceResult.time,            mData='time',               search_method='none'),
                ColumnDT(RaceResult.time,            mData='pace',               search_method='none'),
                ColumnDT(RaceResult.agtime,          mData='agtime',             search_method='none'),
                ColumnDT(RaceResult.agpercent,       mData='agpercent',          search_method='none'),
            ]

            params = request.args.to_dict()
            query = RaceResult.query.filter_by(**resultfilter).join("runner").join("series").filter_by(**seriesfilter).join("race")
            rowTable = DataTables(params, query, columns)
tdamsma commented 5 years ago

Let me give you some quick guidance. The key thing is that you are doing the age calculation in python, not in your database. Best you can do is use actual date types instead of yyyy-mm-dd strings, but perhaps you have a reason no to. But you can also cast the types on the fly (at the cost of some performance)

Try writing a query that calculates the age in the database first, something like (untested, so likely doesn't work directly)

from sqlachemy.types import Date
query = (
    session.query(Race.date.cast(Date) - Runner.dateofbirth.cast(Date))
    .select_from(RaceResult)
    .join(Runner)
    .join(Race)

# print the sql statement, see if it makes sense
print(query)

# get results
print(query.all())

If you succeed, you can turn the query into a column

Hope this helps

louking commented 5 years ago

Use of string was a design decision I made 6 years ago. In retrospect probably the wrong decision but I have a lot of data and code which expects this to be string now, so I think I'm stuck with it.

Your response is very helpful and gets me further on my path. I see the following and hope you don't mind my intruding on your generosity once more. Do you have recommendation on how to handle my case when the birthdate is null string?

query = db.session.query(Race.date.cast(Date) - Runner.dateofbirth.cast(Date)).select_from(RaceResult).join(Runner).join(Race)
print(query)
SELECT CAST(race.date AS DATE) - CAST(runner.dateofbirth AS DATE) AS anon_1 
FROM raceresult INNER JOIN runner ON runner.id = raceresult.runnerid INNER JOIN race ON race.id = raceresult.raceid
rr = query.all()
C:\Users\lking\Documents\Lou's Software\projects\rrwebapp\rrwebapp\lib\site-packages\sqlalchemy\engine\default.py:550: Warning: Incorrect datetime value: ''
  cursor.execute(statement, parameters)
rr[0]
Out[5]: (249180L)
len(rr)
Out[6]: 80942
tdamsma commented 5 years ago

First of all, what would you want the result to be when dateofbirth is empty? Depending on what you want, you can adjust the query to deal with it. It also depends on the database. An example for postgres:

Race.date.cast(Date) - func.coalesce(Runner.dateofbirth.cast(Date), Race.date.cast(Date))
).select_from(RaceResult).join(Runner).join(Race)

coalesce returns an alternate value if the first is NULL. If you want more control on the condition, you can use a case statement: https://docs.sqlalchemy.org/en/13/core/sqlelement.html#sqlalchemy.sql.expression.case

If I where you I would first try to get more SQL knowledge (plenty of nice interactive courses online), and then dive deeper in sqlalchemy. Sqlalchemy is a so called "leaky abstraction", as in you still have to know how to use plain SQL to better understand it.

louking commented 5 years ago

You're right -- I wish I had the time to do a deep dive in all the technologies I work with. SQL in general and sqlalchemy in particular would be very helpful, yet are but a small portion of the applications I've been building.

This morning I'd found https://docs.sqlalchemy.org/en/13/core/tutorial.html which I think will help with all my needs building sqla expressions (and using sqlalchemy-datatables).

Again thanks so much for your guidance to this hacker. I'll aim for more self-sufficiency in this area.

louking commented 5 years ago

For anyone finding this looking for a way to modify the column display, what @tdamsma suggested about using sqla expressions works of course, but another technique which uses python to convert the display is to create a derived class from TypeDecorator.

Since the data is only coming out of the database (not going in), process_result_value can be used to convert the data in the database using python.

Here is a python 2.7 example of a boolean replacement, where the derived class allows the true and false display to be specified.

from sqlalchemy import func, types, cast

class RenderBoolean(types.TypeDecorator):
    impl = types.String

    def __init__(self, *args, **kwargs):
        # set arguments as class attributes
        # ignore all but truedisplay, falsedisplay
        self.truedisplay = kwargs.pop('truedisplay')
        self.falsedisplay = kwargs.pop('falsedisplay')
        super(RenderBoolean, self).__init__(*args, **kwargs)

    # assumes boolean integer value to be converted to boolean display expression
    def process_result_value(self, value, engine):
        # value should be '0' or '1'
        return self.truedisplay if int(value) else self.falsedisplay

def renderboolean(expr, *args, **kwargs):
    return cast(expr, RenderBoolean(*args, **kwargs))

                     :

columns = [
                     :
    ColumnDT(renderboolean(User.active, truedisplay='yes', falsedisplay='no'))
                     :
]