Pegase745 / sqlalchemy-datatables

SQLAlchemy integration of jQuery DataTables >= 1.10.x (Pyramid and Flask examples)
MIT License
159 stars 67 forks source link

Backreferences in defining colums #83

Closed pankus closed 7 years ago

pankus commented 7 years ago

First, thank you so much for your work. It is great! I'm trying to manage a table having m:m relationship mapped with a backref. Something similar to the Parent:Child scheme as described in the sqlalchemy tutorial (http://docs.sqlalchemy.org/en/latest/orm/basic_relationships.html#many-to-many). By defining the columns for the Parent class, for instance, if I use the backreference Parent.children this message appears: " DataTables warning: table id=maindata - Neither 'InstrumentedAttribute' object nor 'Comparator' object associated with Parent.children has an attribute 'cast' ". In sum, this code does not work as I expect (but I'm a newbie):

columns = [
...
ColumnDT(Parent.id, search_method='yadcf_range_number_slider'),
ColumnDT(Parent.children, search_method='yadcf_text'),
...
]

Is there any workaround to solve such a case?

tdamsma commented 7 years ago

Can you also post your query? It should be something like this

query().select_from(Parent).join(Child)

Then you can add columns like so:

columns = [
...
ColumnDT(Parent.id, search_method='yadcf_range_number_slider'),
ColumnDT(Child.name, search_method='yadcf_text'),
...
]

Of course this will result in a table row for each combination of Parent<->Child you have. If you want to have one row for every Parent, then you have to aggregate the Children into one string, e.g. a comma separated list. You should be able to do that with string_agg or something, not 100% sure.

query().select_from(Parent).outerjoin(Child).group_by(Parent.id)

And a column like:

ColumnDT(func.string_agg(Child.name, ','), search_method='yadcf_text')
pankus commented 7 years ago

dear tdamsma thank you very much for your quick answer. Well, following your suggestion, the main query now is:

query = db.session.query().select_from(Parent).outerjoin(Child, Parent.children)

and the columns:

columns = [
...
ColumnDT(Parent.id, search_method='yadcf_range_number_slider'),
ColumnDT(Child.name, search_method='yadcf_text'),
...
]

In this case everything works well However, as you pointed out, this gives a table with each possible combination of the relationship Parent<->Child, whereas I need a row for every Parent and an array of children grouped together. But if the main query becomes:

query = db.session.query().select_from(Parent).outerjoin(Child, Parent.children).group_by(Parent.id)

with the relevant column:

ColumnDT(func.string_agg(Child.name, ', '), search_method='yadcf_text')

This error message appears:

DataTables warning: table id=maindata - Multiple rows were found for one()

As far as I can understand, something happens when the group_by clause is added to the query. On the other hand I checked the query in my database as well as in python shell and it is correct. It gives back the exactly the number of rows I would expect.

p.s. In one of my tries I forgot to append .group_by(Parent.id) to the query. I that case the error message was:

DataTables warning: table id=maindata - (psycopg2.ProgrammingError) aggregate functions are not allowed in WHERE

I don't know if this is relevant here or is a different error.

tdamsma commented 7 years ago

@pankus, glad to help. Forgot about that, but indeed aggregate functions are not allowed in WHERE, so you can not search for the child column now. I guess you do want to be able to do this, in which case you can use a CTE / WITH clause to achieve this.

Luckily, using SQLAlchemy this is not too difficult. From the top of my head (so might contain some mistakes):

parent_with_children = dbsession.query(
    Parent.id, 
    func.string_agg(Child.name, ', ').label('children')
).\
    outerjoin(Child).\
    group_by(Parent.id).\
    cte()

query = db.session.query().select_from(parent_with_children )

columns = [
ColumnDT(parent_with_children .c.id, search_method='yadcf_range_number_slider'),
ColumnDT(parent_with_children .c.children, search_method='yadcf_text'),
]

Note that if your data set is large, this might be a bit slow, as I believe the join between parent and children is always run over the entire parent and children table. The result of this join is put in a CTE, which is then filtered and paginated. Also, this CTE is not indexed.

To overcome this, you could define a separate Materialized view for the parent_with_children table, which you could index/limit/filter as any other table, with no performance impact. Of course, you then need to put in triggers or other logic to keep the Materialized view up to date.

Another alternative would be to partially denormalize you schema, by storing the children in the parent in e.g. a JSONB column. Of course this has other implications

pankus commented 7 years ago

Congratulations @tdamsma. I'm really impressed by your knowledge of this subject. I'm new to sqlalchemy and above all I'm not a real programmer, thus all this appears to me really as a true alchemy. My application now works like a charm. As a minor drawback, I noticed just a little delay in the responses; however this is not a real problem, especially if one takes into consideration the many advantages of using db relationships in such a way. Thanks a lot.

Just the last question (only to learn something more). What about the error message DataTables warning: table id=maindata - Multiple rows were found for one()

Did you understand what does it means? Does it depend on .group_by() clause?

tdamsma commented 7 years ago

In the function _set_yadcf_data min/max values are determined for the 'yadcf_range_number_slider' columns. Selecting the min/max of a query with a group_by clause returns more than one row. Am not entirely sure if using Group_by is actually a good idea and should be supoorted (in which case thsi is a bug), or if it is better to disallow use of group_by in the query altogether.