Open amotl opened 1 year ago
I think we need to find a different solution for this particular line of code, which generates an SQL statement CrateDB does not understand.
case = sql.case((order_value.is_(None), 1), else_=0).label(f"clause_{clause_id}")
CASE WHEN (anon_1.value IS NULL) THEN :param_1 ELSE :param_2 END AS clause_1
CASE WHEN
is the culprit here, or just using that kind of statement within an ORDER BY
clause on behalf of clause_1
./cc @hlcianfagna, @seut
I found an application-based monkeypatch solution for this, essentially intercepting the return values of _get_orderby_clauses
, and removing the corresponding CASE WHEN
statement clauses. The test case test_search_runs_keep_all_runs_when_sorting
succeeds now, this is all I was aiming at here.
First, I tried to use a generic solution by overwriting SQL Compiler's visit_case
visitor method. While this was able to remove the CASE WHEN
clause completely, so this is in general a feasible approach, the MLflow application logic at this place defies that attempt, because it adds the corresponding statement clause item to two different lists, coupling its bookkeeping stronger to subsequent application logic.
clauses.append(case.name)
select_clauses.append(case)
Because of this, there was no way to work around this pitfall by exclusively using SQLAlchemy-based workarounds. We need to patch MLflow instead.
Removing that clause from the query generator obviously had to have negative side effects. Why would the code otherwise have been there on the first hand?
It is failing the test_order_by_attributes
test case, at this spot where the result order changed, with respect to None
values.
E AssertionError: assert ['None', '789', '456', '234', '123', '-123'] == ['789', '456', '234', '123', '-123', 'None']
E At index 0 diff: 'None' != '789'
E Full diff:
E - ['789', '456', '234', '123', '-123', 'None']
E ? --------
E + ['None', '789', '456', '234', '123', '-123']
E ? ++++++++
tests/test_tracking.py:1568: AssertionError
We are working around this by adding code-based sorting logic, in order to compensate for the missing SQL clause which is responsible for properly sorting None and NaN values to the end of the list.
The corresponding patch to omit the offending SQL clause looks like this.
This one is needed to compensate that by code instead of SQL.
Maybe @hammerhead or @hlcianfagna can find a workaround how to formulate the query outlined in the original post using CrateDB?
@amotl I tested with 5.4.2 and I am not seeing the UnsupportedFeatureException
with this query, were you testing with an earlier or newer version by any chance?
Dear Hernan,
thanks for looking into this. I think I used CrateDB Nightly. Indeed, CrateDB supports CASE WHEN ... THEN ... END, I must have overlooked this detail.
Hm, maybe the error message indicates that, while it works in general, the engine can not provide sorting on an alias of that type of clause?
CASE WHEN (anon_1.value IS NULL) THEN ? ELSE ? END AS clause_1
ORDER BY clause_1
With kind regards, Andreas.
@amotl CrateDB does support this:
cr> select CASE WHEN (1 IS NULL) THEN 'yes' ELSE 'no' END AS clause_1 order by clause_1;
+----------+
| clause_1 |
+----------+
| no |
+----------+
SELECT 1 row in set (0.015 sec)
I highly recommend to always try the SQL out to verify that it's not the query in general ;)
Interesting, thanks. Then, some other combination of the SQL makes this statement fail somehow. I will re-evaluate the situation by using the plain SQL statement, as you suggested, in order to find out further details.
Is it eventually related to https://github.com/crate/crate/issues/15029?
About
The test case
test_search_runs_keep_all_runs_when_sorting
fails when invokingsearch_runs
with anorder_by
parameter.Details
The root cause, which generates the SQL statement outlined below, is this elaborate code within MLflow's
mlflow.store.tracking.sqlalchemy_store._get_orderby_clauses
:Exception
Intermediate query
This is the query in intermediate form while being created and processed by SQLAlchemy.
Remarks
_No workaround for this has been found so far, so, most probably, the test case will be skipped for now. However, depending on needs, using MLflow's
search_runs
feature, together with sorting, may be an important feature we would not like to skip. So, I will be happy for any support here._