astropy / pyvo

An Astropy affiliated package providing access to remote data and services of the Virtual Observatory (VO) using Python.
https://pyvo.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
74 stars 50 forks source link

Make extra table constraints subqueried #562

Closed msdemlei closed 2 weeks ago

msdemlei commented 2 weeks ago

There has been a fairly evil thinko in the way we've been writing constraints requiring extra tables. What we did is something like

NATURAL JOIN rr.table_column
WHERE ucd like 'phot.mag;%'

This is innocent enough in hand-written, one-shot queries. For us with our complicated way of pulling all capabilities in one go, it's bad, because if there's n columns with a matching UCD, then every interface will appear n times in our interfaces list. That was relatively harmless before we became more uptight about duplicated interfaces in #505 and environment. It is a disaster now, and so we want to change our style to

WHERE ivoid in (select ivoid from rr.table_column WHERE ucd like 'phot.mag;%').

This is probably not going to be much of a difference in terms of what the databases actually execute; the query planner should take care of that.

Oh, by the way, this PR should be merged after #561 and contains that commit already.

ManonMarchand commented 2 weeks ago

LGTM except a question: did you intentionally do three sub-queries for the Freetext constraint? Is this more efficient than a OR in a single sub-query?

msdemlei commented 2 weeks ago

Ok, sorry about the whitespace and the pdb.

This should now clear the CI except for the mysterious linkcheck message. If a sphinx deity can help me out there, that'd be much appreciated. But perhaps this can go even with this broken link?

msdemlei commented 2 weeks ago

On Thu, Jun 27, 2024 at 12:35:27AM -0700, Manon Marchand wrote:

LGTM except a question: did you intentionally do three sub-queries for the Freetext constraint? Is this more efficient than a OR in a single sub-query?

This has been introduced in an older change, and indeed I should be moving the non-UNION branch to subqueries as well (I promise I will do so later). Anyway, the background is that the OR operator in many situations precludes the use of an index, and UNION helps in these cases. So, yes, that's by design.

msdemlei commented 2 weeks ago

On Thu, Jun 27, 2024 at 05:58:47PM -0700, Brigitta Sipőcz wrote:

I'll add a commit that resolves this issue, take it or leave it (I mean we can work around not exposing the class in the public API if necessary).

Ah, thanks.

FWIW, Constraint is in the public API, so I don't think it's a big problem to add the new one, too.

Yeah, and I'm pretty sure that when I try to teach people how to write constraints of their own, I'll nudge them towards SubqueriedConstraint, as a lot less can go wrong with it.

ManonMarchand commented 2 weeks ago

If @bsipocz also approves I think this can be merged

bsipocz commented 2 weeks ago

I haven't reviewed the content, but 100% trust your review, so go for the merge.

The only thing to check for is if the PR is fixing a bug that is already in the release, then it can have the bugfix milestone (atm I'm not sure if we have one more of those or 1.6 next, so we don't care about changelog consistency)