Reimplement api to rely on SQLALchemy core functions

tdamsma commented 7 years ago

I am a big fan of sqlalchemy-datatables, but I often run into problems with how sqlalchemy-datatables tries to reimplement some of SQLAlchemy for dealing with joins etc. Complicated queries, column_properties, of ad-hoc func() statements applied to columns can easily break sqlalchemy-datatables.

I forked this repo and came up with a way to deal with this here. The main idea is that instead of passing an sqla_object, query, and columns, only pass the query and the columns. And instead of passing the columns by name, one can pass any SQLAlchemy column like expression, like func.upper(User.name).

An example:

Without datatables, a query can look like this:

query = self.session.query(
        User.id,
        User.name,
        User.birthday,
        func.date_part('year', func.age(User.birthday)),
    ).filter(User.id > 10)

result = query.all()

Currently, with sqlalchemy datatables there is no way to calculate the age on the fly using sql statements. In my implementation, you have to split the query and the columns and pass those to DataTables:

query = self.session.query().filter(User.id > 10)
columns = [
        ColumnDT(User.id),
        ColumnDT(User.name),
        ColumnDT(User.birthday),
        ColumnDT(func.date_part('year', func.age(User.birthday))),
        ]
rowTable = DataTables(request, query, columns)

If you check the code, you can see it is much cleaner, as dealing with joins, properties etc are made by sqlachemy. The result of the query is determined operating on the sqlalchmey expressions.

So, I am quite happy with this change, but as it is an api change, I was wondering if you are interested in merging these changes into this repo. If so, I can prepare a PR.

There a re a few rough edges with my current implementation though:

I changed the search, as I have no need for regex searches. I do however need things like >12 to search for numbers greater than 12 (this is implemented)
I didn't implement the 'search all columns' option

I would love to hear your thoughts

Pegase745 commented 7 years ago

Cheers for the great job you've done there ! :tada:

It's gonna be easier to use that way you're right, and we can even do more magic with it. I would love to see a Pull Request if you have time to fix tests without cheating :wink:

I don't have any thoughts right now on the loss of regex searches and searching all columns features.. The whole PR will be a big breaking change but why not.

tdamsma commented 7 years ago

Ok, great! I guess I can try to reimplement regex and global search to keep a bit of compatibility, and then I'll submit a PR. I could use some help on re-defining the API, as this is something that has a lot of impact.

I forgot to mention I also dropped support for datatables <1.10, what is your opinion on that?

FYI, I just managed to implement search methods for the yadcf plugin, this works quite well and required very little code.

Pegase745 commented 7 years ago

I am not familiar with the yadcf, so glad that someone with experience implemented something rather than me.

I liked the fact of keeping compatibility so 'easily', but it demands many if cases in the code and it was nice when DT 1.10 got out a couple of years ago. Now I may begin to think that it's useless. What do you think ? I don't actually have an estimate of the people still using DT<1.10

louking commented 7 years ago

Appreciate you keeping regex, as I need for my use case.

Michel - what is status of outstanding PRs? In particular I need #63 but I think these are all going to start diverging.

Pegase745 commented 7 years ago

I thought about it yeah. Maybe @tdamsma have an input on whether it's really diverging a lot in theory ?

tdamsma commented 7 years ago

Exposing rexeg functionality to untrusted input exposes the server to a nice potential attack surface, so that was my motivation for dropping it. I guess a proper way to implement regex search would be to disable server-side by default, and then only enable when both client asks for it (via columns[x][search][regex] == 'true' and when enabled server side.

For non-regex searches, I now use the ilike operator in combination with adding % before and after the search string. This is not exactly the same way ass client side search works, but comes pretty close. Would the difference be a problem?

In the meantime, I managed to reimplement global search

tdamsma commented 7 years ago

@louking, With the idea I'm working on, providing a backend for yadcf looks something like this:

    query = request.dbsession.query()
    columns = [
        ColumnDT(Person.id, search_method='yadcf_range_number_slider'),
        ColumnDT(Person.name, search_method='yadcf_text'),
        ColumnDT(Person.name.label('n2'), search_method='yadcf_autocomplete'),
        ColumnDT(Person.length, search_method='yadcf_range_number'),
        ColumnDT(Person.birthday, search_method='yadcf_range_date'),
        ColumnDT(Person.age, search_method='yadcf_range_number_slider'),
        ColumnDT(Person.age.label('a2'), search_method='yadcf_multi_select'),
        ColumnDT(Person.name.label('n3'), search_method='yadcf_multi_select'),
        ColumnDT(Person.sign_up, search_method='yadcf_range_date'),
    ]

Would that solve your use case?

tdamsma commented 7 years ago

One more thing: I used pandas to nicely convert a query to a dictionary to output the results. Pandas is not yet available on pypy, so I effectively dropped pypy compatibility. Would this be a problem?

I also removed the last cheat to pass the tests, though the code could do with a few more tests. Still left todo:

add more tests
scrutinise new api
fix django and pyramid examples
update documentation
reimplement regex
(remove pandas dependency?)
probably a lot more

I'll work on preparing a PR as soon as I have cleaned up the code

louking commented 7 years ago

My use of regex was for the alternation supplied by yadcf when using multi_select, so if you support that I'm fine with regex being removed.

My deployment scenario prefers use of pypi. I already have an issue with two home grown packages I haven't put on pypi (and the third being my fork of sqlalchemy-datatables required because of #63).

Will these changes affect output result in the following code? I (and yadcf and datatables I suppose) have some expectations what that looks like.

            args = request.args.copy()
            rowTable = DataTables(args, model, query, columns)
            output_result = rowTable.output_result()

tdamsma commented 7 years ago

@louking you would have to make some adjustments to the way you call the rowTable function, but then you should have better support for yadcf then you now have. If you use search_method='yadcf_multi_select' then the list of all possible select values automatically gets added to the output, so yadcf can provide you with the appropriate multi-select options

louking commented 7 years ago

I will need to test how you query the list of all possible select values. With a large dataset some optimization may be needed. Also at least one of my multiselect filters needs to be able to choose from values outside of the returned dataset.

tdamsma commented 7 years ago

I use a distinct query on that particular column. This lets the database sort it out. If the query is slow, just add an index to optimize (just like you would do otherwise)

louking commented 7 years ago

On my optimization, maybe I'm remembering a bug in my code. Looking at the current query now it looks like it is about what you are saying.

And I suppose for the column for which I want the user to be able to choose values outside the dataset I can simply overwrite your yadcf_data_n list in my output_result.

But maybe we're not saying the same thing: for some of my select filters I want them to be limited to the results already filtered, and for some of mine I want them to be unlimited, i.e., all possible values including those outside what is seen by the user.

In any case, I suspect in some cases I'll have to overwrite what your query returns.

If interested, see https://github.com/louking/rrwebapp/blob/e908c1986d2d684344aad6c6528556ec8a03a4c7/rrwebapp/results.py#L1893 . I'm constructing a select with name (age) as label and id as value. These labels/values are queried from a table which sqlalchemy-datatables is not aware of.

Earlier starting at 1854 I even have to post-filter the data returned due to some calculations needed.

As you can see, this is a fairly complex use of sqlalchemy-datatables, and I don't think these requirements could be handled generally.

tdamsma commented 7 years ago

I was actually still doubting how to implement filtering for yadcf select, multi-select and autocomplete. It made sense to me to only present the available options after the currently selected filters on the other columns are applied. Perhaps this should be configurable.

The range sliders always get the min/max irrespective of the other filters, as changing the min/max seems to break the jQuery-ui sliders, they forget the previous value.

Pegase745 / sqlalchemy-datatables

Reimplement api to rely on SQLALchemy core functions #66