astropy / pyvo

An Astropy affiliated package providing access to remote data and services of the Virtual Observatory (VO) using Python.
https://pyvo.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
74 stars 50 forks source link

Regression: Multi-Constraint RegTAP queries are much slower now #571

Closed msdemlei closed 1 week ago

msdemlei commented 1 week ago

With the recent move to subqueries in the implementation of most RegTAP constraints, the planner at least on our default service gets confused with certain (fairly simple) combinations of constraints and generates horrible query plans. Something as simple as

rscs = registry.search(
    registry.Freetext("quasar"),
    registry.UCD("src.redshift"))

may take over 6 seconds on the server and perhaps even time out. For reference, with the right query plan, the SQL execution time is 300 ms server-side.

Regrettably, I have not yet understood why the planner decides on its awful plan. Using the good ol' offset 0 trick actually works; there is https://github.com/msdemlei/pyvo/tree/subqueried-condition-planner-hack to prove it. However, I believe it would be horribly wrong to have these sorts of planner hacks in pyVO.

Let me further poke at the problem. Can we have the bug as a blocker for 1.6? This is bad enough that I'd rather revert PR #562 and think of another solution than release it like this unless we know that the major RegTAP endpoints don't dramatically mis-plan even for rather moderate constraints.